All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/23] PCI: Allow BAR movement during hotplug
@ 2019-08-16 16:50 ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

If the firmware or kernel has arranged memory for PCIe devices in a way
that doesn't provide enough space for BARs of a new hotplugged device, the
kernel can pause the drivers of the "obstructing" devices and move their
BARs, so the new BARs can fit into the freed spaces.

To rearrange the BARs and bridge windows these patches releases all of them
after a rescan and re-assigns in the same way as during the initial PCIe
topology scan at system boot.

When a driver is un-paused by the kernel after the PCIe rescan, it should
check if its BARs had moved, and ioremap() them.

Drivers indicate their support of the feature by implementing the new hooks
.rescan_prepare() and .rescan_done() in the struct pci_driver. If a driver
doesn't yet support the feature, BARs of its devices will be considered as
immovable (by checking the pci_dev_movable_bars_supported(dev)) and handled
in the same way as resources with the IORESOURCE_PCI_FIXED flag.

If a driver doesn't yet support the feature, its devices are guaranteed to
have their BARs remaining untouched.

Tested on:
 - x86_64 with "pci=realloc,assign-busses,use_crs,pcie_bus_peer2peer";
 - POWER8 PowerNV+OPAL+PHB3 ppc64le with [1] applied and the following:
   "pci=realloc,pcie_bus_peer2peer";
 - both platforms [with extra pacthes (yet to be submitted) for movable bus
   numbers]: manually initiated (via sysfs) rescan has found and turned on
   a hotplugged bridge.

Not so many platforms and test cases were covered, so all who are
interested are highly welcome to test on your setups - the more exotic the
better!

This patchset is a part of our work on adding support for hotplugging
bridges full of other bridges, NVME drives, SAS HBAs and GPUs without
special requirements such as Hot-Plug Controller, reservation of bus
numbers or memory regions by firmware, etc. The next patchset to submit
will implement the movable bus numbers.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-August/195272.html
    [PATCH v6 0/5] powerpc/powernv/pci: Make hotplug self-sufficient, independent of FW and DT

Changes since v4:
 - Feature is enabled by default (turned on by one of the latest patches);
 - Add pci_dev_movable_bars_supported(dev) instead of marking the immovable
   BARs with the IORESOURCE_PCI_FIXED flag;
 - Set up PCIe bridges during rescan via sysfs, so MPS settings are now
   configured not only during system boot or pcihp events;
 - Allow movement of switch's BARs if claimed by portdrv;
 - Update EEH address caches after rescan for powerpc;
 - Don't disable completely hot-added devices which can't have BARs being
   fit - just disable their BARs, so they are still visible in lspci etc;
 - Clearer names: fixed_range_hard -> immovable_range, fixed_range_soft ->
   realloc_range;
 - Drop the patch for pci_restore_config_space() - fixed by properly using
   the runtime PM.

Changes since v3:
 - Rebased to the upstream, so the patches apply cleanly again.

Changes since v2:
 - Fixed double-assignment of bridge windows;
 - Fixed assignment of fixed prefetched resources;
 - Fixed releasing of fixed resources;
 - Fixed a debug message;
 - Removed auto-enabling the movable BARs for x86 - let's rely on the
   "pcie_movable_bars=force" option for now;
 - Reordered the patches - bugfixes first.

Changes since v1:
 - Add a "pcie_movable_bars={ off | force }" command line argument;
 - Handle the IORESOURCE_PCI_FIXED flag properly;
 - Don't move BARs of devices which don't support the feature;
 - Guarantee that new hotplugged devices will not steal memory from working
   devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
   flag;
 - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
   using the reset_prepare()+reset_done() from struct pci_error_handlers;
 - Add a bugfix of a race condition;
 - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
 - Fix the compatibility of the feature with pm_runtime and D3-state;
 - Hotplug events from pciehp also can move BARs;
 - Add support of the feature to the NVME driver.

Sergey Miroshnichenko (23):
  PCI: Fix race condition in pci_enable/disable_device()
  PCI: Enable bridge's I/O and MEM access for hotplugged devices
  PCI: hotplug: Add a flag for the movable BARs feature
  PCI: Define PCI-specific version of the release_child_resources()
  PCI: hotplug: movable BARs: Fix reassigning the released bridge
    windows
  PCI: hotplug: movable BARs: Recalculate all bridge windows during
    rescan
  PCI: hotplug: movable BARs: Don't allow added devices to steal
    resources
  PCI: Include fixed and immovable BARs into the bus size calculating
  PCI: Prohibit assigning BARs and bridge windows to non-direct parents
  PCI: hotplug: movable BARs: Try to assign unassigned resources only
    once
  PCI: hotplug: movable BARs: Calculate immovable parts of bridge
    windows
  PCI: hotplug: movable BARs: Compute limits for relocated bridge
    windows
  PCI: Make sure bridge windows include their fixed BARs
  PCI: Fix assigning the fixed prefetchable resources
  PCI: hotplug: movable BARs: Assign fixed and immovable BARs before
    others
  PCI: hotplug: movable BARs: Don't reserve IO/mem bus space
  powerpc/pci: Fix crash with enabled movable BARs
  powerpc/pci: Handle BAR movement
  PCI: hotplug: Configure MPS for hot-added bridges during bus rescan
  PCI: hotplug: movable BARs: Enable the feature by default
  nvme-pci: Handle movable BARs
  PCI/portdrv: Declare support of movable BARs
  PCI: pciehp: movable BARs: Trigger a domain rescan on hp events

 .../admin-guide/kernel-parameters.txt         |   7 +
 arch/powerpc/kernel/pci-hotplug.c             |  10 +
 arch/powerpc/platforms/powernv/pci-ioda.c     |   3 +-
 drivers/nvme/host/pci.c                       |  21 +-
 drivers/pci/bus.c                             |   2 +-
 drivers/pci/hotplug/pciehp_pci.c              |   5 +
 drivers/pci/pci.c                             |  58 +++-
 drivers/pci/pci.h                             |  30 ++
 drivers/pci/pcie/portdrv_pci.c                |  11 +
 drivers/pci/probe.c                           | 295 +++++++++++++++++-
 drivers/pci/setup-bus.c                       | 276 +++++++++++++---
 drivers/pci/setup-res.c                       |  48 ++-
 include/linux/pci.h                           |  21 ++
 13 files changed, 739 insertions(+), 48 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5 00/23] PCI: Allow BAR movement during hotplug
@ 2019-08-16 16:50 ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

If the firmware or kernel has arranged memory for PCIe devices in a way
that doesn't provide enough space for BARs of a new hotplugged device, the
kernel can pause the drivers of the "obstructing" devices and move their
BARs, so the new BARs can fit into the freed spaces.

To rearrange the BARs and bridge windows these patches releases all of them
after a rescan and re-assigns in the same way as during the initial PCIe
topology scan at system boot.

When a driver is un-paused by the kernel after the PCIe rescan, it should
check if its BARs had moved, and ioremap() them.

Drivers indicate their support of the feature by implementing the new hooks
.rescan_prepare() and .rescan_done() in the struct pci_driver. If a driver
doesn't yet support the feature, BARs of its devices will be considered as
immovable (by checking the pci_dev_movable_bars_supported(dev)) and handled
in the same way as resources with the IORESOURCE_PCI_FIXED flag.

If a driver doesn't yet support the feature, its devices are guaranteed to
have their BARs remaining untouched.

Tested on:
 - x86_64 with "pci=realloc,assign-busses,use_crs,pcie_bus_peer2peer";
 - POWER8 PowerNV+OPAL+PHB3 ppc64le with [1] applied and the following:
   "pci=realloc,pcie_bus_peer2peer";
 - both platforms [with extra pacthes (yet to be submitted) for movable bus
   numbers]: manually initiated (via sysfs) rescan has found and turned on
   a hotplugged bridge.

Not so many platforms and test cases were covered, so all who are
interested are highly welcome to test on your setups - the more exotic the
better!

This patchset is a part of our work on adding support for hotplugging
bridges full of other bridges, NVME drives, SAS HBAs and GPUs without
special requirements such as Hot-Plug Controller, reservation of bus
numbers or memory regions by firmware, etc. The next patchset to submit
will implement the movable bus numbers.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-August/195272.html
    [PATCH v6 0/5] powerpc/powernv/pci: Make hotplug self-sufficient, independent of FW and DT

Changes since v4:
 - Feature is enabled by default (turned on by one of the latest patches);
 - Add pci_dev_movable_bars_supported(dev) instead of marking the immovable
   BARs with the IORESOURCE_PCI_FIXED flag;
 - Set up PCIe bridges during rescan via sysfs, so MPS settings are now
   configured not only during system boot or pcihp events;
 - Allow movement of switch's BARs if claimed by portdrv;
 - Update EEH address caches after rescan for powerpc;
 - Don't disable completely hot-added devices which can't have BARs being
   fit - just disable their BARs, so they are still visible in lspci etc;
 - Clearer names: fixed_range_hard -> immovable_range, fixed_range_soft ->
   realloc_range;
 - Drop the patch for pci_restore_config_space() - fixed by properly using
   the runtime PM.

Changes since v3:
 - Rebased to the upstream, so the patches apply cleanly again.

Changes since v2:
 - Fixed double-assignment of bridge windows;
 - Fixed assignment of fixed prefetched resources;
 - Fixed releasing of fixed resources;
 - Fixed a debug message;
 - Removed auto-enabling the movable BARs for x86 - let's rely on the
   "pcie_movable_bars=force" option for now;
 - Reordered the patches - bugfixes first.

Changes since v1:
 - Add a "pcie_movable_bars={ off | force }" command line argument;
 - Handle the IORESOURCE_PCI_FIXED flag properly;
 - Don't move BARs of devices which don't support the feature;
 - Guarantee that new hotplugged devices will not steal memory from working
   devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
   flag;
 - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
   using the reset_prepare()+reset_done() from struct pci_error_handlers;
 - Add a bugfix of a race condition;
 - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
 - Fix the compatibility of the feature with pm_runtime and D3-state;
 - Hotplug events from pciehp also can move BARs;
 - Add support of the feature to the NVME driver.

Sergey Miroshnichenko (23):
  PCI: Fix race condition in pci_enable/disable_device()
  PCI: Enable bridge's I/O and MEM access for hotplugged devices
  PCI: hotplug: Add a flag for the movable BARs feature
  PCI: Define PCI-specific version of the release_child_resources()
  PCI: hotplug: movable BARs: Fix reassigning the released bridge
    windows
  PCI: hotplug: movable BARs: Recalculate all bridge windows during
    rescan
  PCI: hotplug: movable BARs: Don't allow added devices to steal
    resources
  PCI: Include fixed and immovable BARs into the bus size calculating
  PCI: Prohibit assigning BARs and bridge windows to non-direct parents
  PCI: hotplug: movable BARs: Try to assign unassigned resources only
    once
  PCI: hotplug: movable BARs: Calculate immovable parts of bridge
    windows
  PCI: hotplug: movable BARs: Compute limits for relocated bridge
    windows
  PCI: Make sure bridge windows include their fixed BARs
  PCI: Fix assigning the fixed prefetchable resources
  PCI: hotplug: movable BARs: Assign fixed and immovable BARs before
    others
  PCI: hotplug: movable BARs: Don't reserve IO/mem bus space
  powerpc/pci: Fix crash with enabled movable BARs
  powerpc/pci: Handle BAR movement
  PCI: hotplug: Configure MPS for hot-added bridges during bus rescan
  PCI: hotplug: movable BARs: Enable the feature by default
  nvme-pci: Handle movable BARs
  PCI/portdrv: Declare support of movable BARs
  PCI: pciehp: movable BARs: Trigger a domain rescan on hp events

 .../admin-guide/kernel-parameters.txt         |   7 +
 arch/powerpc/kernel/pci-hotplug.c             |  10 +
 arch/powerpc/platforms/powernv/pci-ioda.c     |   3 +-
 drivers/nvme/host/pci.c                       |  21 +-
 drivers/pci/bus.c                             |   2 +-
 drivers/pci/hotplug/pciehp_pci.c              |   5 +
 drivers/pci/pci.c                             |  58 +++-
 drivers/pci/pci.h                             |  30 ++
 drivers/pci/pcie/portdrv_pci.c                |  11 +
 drivers/pci/probe.c                           | 295 +++++++++++++++++-
 drivers/pci/setup-bus.c                       | 276 +++++++++++++---
 drivers/pci/setup-res.c                       |  48 ++-
 include/linux/pci.h                           |  21 ++
 13 files changed, 739 insertions(+), 48 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko, Srinath Mannam,
	Marta Rybczynska

This is a yet another approach to fix an old [1-2] concurrency issue, when:
 - two or more devices are being hot-added into a bridge which was
   initially empty;
 - a bridge with two or more devices is being hot-added;
 - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.

The problem is that a bridge is reported as enabled before the MEM/IO bits
are actually written to the PCI_COMMAND register, so another driver thread
starts memory requests through the not-yet-enabled bridge:

 CPU0                                        CPU1

 pci_enable_device_mem()                     pci_enable_device_mem()
   pci_enable_bridge()                         pci_enable_bridge()
     pci_is_enabled()
       return false;
     atomic_inc_return(enable_cnt)
     Start actual enabling the bridge
     ...                                         pci_is_enabled()
     ...                                           return true;
     ...                                     Start memory requests <-- FAIL
     ...
     Set the PCI_COMMAND_MEMORY bit <-- Must wait for this

Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
while enabling upstream bridges"), but adding a per-device mutexes and
preventing the dev->enable_cnt from from incrementing early.

CC: Srinath Mannam <srinath.mannam@broadcom.com>
CC: Marta Rybczynska <mrybczyn@kalray.eu>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>

[1] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH v3] pci: Concurrency issue during pci enable bridge

[2] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
    [RFC PATCH] nvme: avoid race-conditions when enabling devices
---
 drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  1 +
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1b27b5af3d55..e7f8c354e644 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	struct pci_dev *bridge;
 	int retval;
 
+	mutex_lock(&dev->enable_mutex);
+
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
 		pci_enable_bridge(bridge);
@@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	if (pci_is_enabled(dev)) {
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
+		mutex_unlock(&dev->enable_mutex);
 		return;
 	}
 
@@ -1660,11 +1663,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_err(dev, "Error enabling bridge (%d), continuing\n",
 			retval);
 	pci_set_master(dev);
+	mutex_unlock(&dev->enable_mutex);
 }
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
+	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
+	bool need_lock = !dev->subordinate;
 	int err;
 	int i, bars = 0;
 
@@ -1680,8 +1686,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
 	}
 
-	if (atomic_inc_return(&dev->enable_cnt) > 1)
+	if (need_lock)
+		mutex_lock(&dev->enable_mutex);
+	if (pci_is_enabled(dev)) {
+		if (need_lock)
+			mutex_unlock(&dev->enable_mutex);
 		return 0;		/* already enabled */
+	}
 
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
@@ -1696,8 +1707,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 			bars |= (1 << i);
 
 	err = do_pci_enable_device(dev, bars);
-	if (err < 0)
-		atomic_dec(&dev->enable_cnt);
+	if (err >= 0)
+		atomic_inc(&dev->enable_cnt);
+	if (need_lock)
+		mutex_unlock(&dev->enable_mutex);
 	return err;
 }
 
@@ -1941,15 +1954,20 @@ void pci_disable_device(struct pci_dev *dev)
 	if (dr)
 		dr->enabled = 0;
 
+	mutex_lock(&dev->enable_mutex);
 	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
 		      "disabling already-disabled device");
 
-	if (atomic_dec_return(&dev->enable_cnt) != 0)
+	if (atomic_dec_return(&dev->enable_cnt) != 0) {
+		mutex_unlock(&dev->enable_mutex);
 		return;
+	}
 
 	do_pci_disable_device(dev);
 
 	dev->is_busmaster = 0;
+
+	mutex_unlock(&dev->enable_mutex);
 }
 EXPORT_SYMBOL(pci_disable_device);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a3c7338fad86..2e58ece820e8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2427,6 +2427,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
 	INIT_LIST_HEAD(&dev->bus_list);
 	dev->dev.type = &pci_dev_type;
 	dev->bus = pci_bus_get(bus);
+	mutex_init(&dev->enable_mutex);
 
 	return dev;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 9e700d9f9f28..d3a72159722d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -425,6 +425,7 @@ struct pci_dev {
 	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
+	struct mutex	enable_mutex;
 
 	u32		saved_config_space[16]; /* Config space saved at suspend time */
 	struct hlist_head saved_cap_space;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Marta Rybczynska, Sergey Miroshnichenko, Srinath Mannam,
	Bjorn Helgaas, linux

This is a yet another approach to fix an old [1-2] concurrency issue, when:
 - two or more devices are being hot-added into a bridge which was
   initially empty;
 - a bridge with two or more devices is being hot-added;
 - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.

The problem is that a bridge is reported as enabled before the MEM/IO bits
are actually written to the PCI_COMMAND register, so another driver thread
starts memory requests through the not-yet-enabled bridge:

 CPU0                                        CPU1

 pci_enable_device_mem()                     pci_enable_device_mem()
   pci_enable_bridge()                         pci_enable_bridge()
     pci_is_enabled()
       return false;
     atomic_inc_return(enable_cnt)
     Start actual enabling the bridge
     ...                                         pci_is_enabled()
     ...                                           return true;
     ...                                     Start memory requests <-- FAIL
     ...
     Set the PCI_COMMAND_MEMORY bit <-- Must wait for this

Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
while enabling upstream bridges"), but adding a per-device mutexes and
preventing the dev->enable_cnt from from incrementing early.

CC: Srinath Mannam <srinath.mannam@broadcom.com>
CC: Marta Rybczynska <mrybczyn@kalray.eu>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>

[1] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH v3] pci: Concurrency issue during pci enable bridge

[2] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
    [RFC PATCH] nvme: avoid race-conditions when enabling devices
---
 drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  1 +
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1b27b5af3d55..e7f8c354e644 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	struct pci_dev *bridge;
 	int retval;
 
+	mutex_lock(&dev->enable_mutex);
+
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
 		pci_enable_bridge(bridge);
@@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	if (pci_is_enabled(dev)) {
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
+		mutex_unlock(&dev->enable_mutex);
 		return;
 	}
 
@@ -1660,11 +1663,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_err(dev, "Error enabling bridge (%d), continuing\n",
 			retval);
 	pci_set_master(dev);
+	mutex_unlock(&dev->enable_mutex);
 }
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
+	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
+	bool need_lock = !dev->subordinate;
 	int err;
 	int i, bars = 0;
 
@@ -1680,8 +1686,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
 	}
 
-	if (atomic_inc_return(&dev->enable_cnt) > 1)
+	if (need_lock)
+		mutex_lock(&dev->enable_mutex);
+	if (pci_is_enabled(dev)) {
+		if (need_lock)
+			mutex_unlock(&dev->enable_mutex);
 		return 0;		/* already enabled */
+	}
 
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
@@ -1696,8 +1707,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 			bars |= (1 << i);
 
 	err = do_pci_enable_device(dev, bars);
-	if (err < 0)
-		atomic_dec(&dev->enable_cnt);
+	if (err >= 0)
+		atomic_inc(&dev->enable_cnt);
+	if (need_lock)
+		mutex_unlock(&dev->enable_mutex);
 	return err;
 }
 
@@ -1941,15 +1954,20 @@ void pci_disable_device(struct pci_dev *dev)
 	if (dr)
 		dr->enabled = 0;
 
+	mutex_lock(&dev->enable_mutex);
 	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
 		      "disabling already-disabled device");
 
-	if (atomic_dec_return(&dev->enable_cnt) != 0)
+	if (atomic_dec_return(&dev->enable_cnt) != 0) {
+		mutex_unlock(&dev->enable_mutex);
 		return;
+	}
 
 	do_pci_disable_device(dev);
 
 	dev->is_busmaster = 0;
+
+	mutex_unlock(&dev->enable_mutex);
 }
 EXPORT_SYMBOL(pci_disable_device);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a3c7338fad86..2e58ece820e8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2427,6 +2427,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
 	INIT_LIST_HEAD(&dev->bus_list);
 	dev->dev.type = &pci_dev_type;
 	dev->bus = pci_bus_get(bus);
+	mutex_init(&dev->enable_mutex);
 
 	return dev;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 9e700d9f9f28..d3a72159722d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -425,6 +425,7 @@ struct pci_dev {
 	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
+	struct mutex	enable_mutex;
 
 	u32		saved_config_space[16]; /* Config space saved at suspend time */
 	struct hlist_head saved_cap_space;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 02/23] PCI: Enable bridge's I/O and MEM access for hotplugged devices
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

The PCI_COMMAND_IO and PCI_COMMAND_MEMORY bits of the bridge must be
updated not only when enabling the bridge for the first time, but also if a
hotplugged device requests these types of resources.

Originally these bits were set by the pci_enable_device_flags() only, which
exits early if the bridge is already pci_is_enabled(). So if the bridge was
empty initially (an edge case), then hotplugged devices fail to IO/MEM.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e7f8c354e644..61d951766087 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1652,6 +1652,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_enable_bridge(bridge);
 
 	if (pci_is_enabled(dev)) {
+		int i, bars = 0;
+
+		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
+			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
+				bars |= (1 << i);
+		}
+		do_pci_enable_device(dev, bars);
+
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
 		mutex_unlock(&dev->enable_mutex);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 02/23] PCI: Enable bridge's I/O and MEM access for hotplugged devices
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

The PCI_COMMAND_IO and PCI_COMMAND_MEMORY bits of the bridge must be
updated not only when enabling the bridge for the first time, but also if a
hotplugged device requests these types of resources.

Originally these bits were set by the pci_enable_device_flags() only, which
exits early if the bridge is already pci_is_enabled(). So if the bridge was
empty initially (an edge case), then hotplugged devices fail to IO/MEM.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e7f8c354e644..61d951766087 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1652,6 +1652,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_enable_bridge(bridge);
 
 	if (pci_is_enabled(dev)) {
+		int i, bars = 0;
+
+		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
+			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
+				bars |= (1 << i);
+		}
+		do_pci_enable_device(dev, bars);
+
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
 		mutex_unlock(&dev->enable_mutex);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko, Sam Bobroff,
	Rajat Jain, Lukas Wunner, Oliver O'Halloran, David Laight

When hot-adding a device, the bridge may have windows not big enough (or
fragmented too much) for newly requested BARs to fit in. And expanding
these bridge windows may be impossible because blocked by "neighboring"
BARs and bridge windows.

Still, it may be possible to allocate a memory region for new BARs with the
following procedure:

1) notify all the drivers which support movable BARs to pause and release
   the BARs; the rest of the drivers are guaranteed that their devices will
   not get BARs moved;

2) release all the bridge windows except of root bridges;

3) try to recalculate new bridge windows that will fit all the BAR types:
   - fixed;
   - immovable;
   - movable;
   - newly requested by hot-added devices;

4) if the previous step fails, disable BARs for one of the hot-added
   devices and retry from step 3;

5) notify the drivers, so they remap BARs and resume.

This makes the prior reservation of memory by BIOS/bootloader/firmware not
required anymore for the PCI hotplug.

Drivers indicate their support of movable BARs by implementing the new
.rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
device's activity must be paused during a rescan, and iounmap()+ioremap()
must be applied to every used BAR.

The platform also may need to prepare to BAR movement, so new hooks added:
pcibios_rescan_prepare(pci_dev) and pcibios_rescan_prepare(pci_dev).

This patch is a preparation for future patches with actual implementation,
and for now it just does the following:
 - declares the feature;
 - defines pci_movable_bars_enabled(), pci_dev_movable_bars_supported(dev);
 - invokes the .rescan_prepare() and .rescan_done() driver notifiers;
 - declares and invokes the pcibios_rescan_prepare()/_done() hooks;
 - adds the PCI_IMMOVABLE_BARS flag.

The feature is disabled by default (via PCI_IMMOVABLE_BARS) until the final
patch of the series. It can be overridden per-arch using this flag or by
the following command line option:

    pcie_movable_bars={ off | force }

CC: Sam Bobroff <sbobroff@linux.ibm.com>
CC: Rajat Jain <rajatja@google.com>
CC: Lukas Wunner <lukas@wunner.de>
CC: Oliver O'Halloran <oohall@gmail.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 .../admin-guide/kernel-parameters.txt         |  7 ++
 drivers/pci/pci-driver.c                      |  2 +
 drivers/pci/pci.c                             | 24 ++++++
 drivers/pci/pci.h                             |  2 +
 drivers/pci/probe.c                           | 86 ++++++++++++++++++-
 include/linux/pci.h                           |  7 ++
 6 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 47d981a86e2f..e2274ee87a35 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3526,6 +3526,13 @@
 		nomsi	Do not use MSI for native PCIe PME signaling (this makes
 			all PCIe root ports use INTx for all services).
 
+	pcie_movable_bars=[PCIE]
+			Override the movable BARs support detection:
+		off
+			Disable even if supported by the platform
+		force
+			Enable even if not explicitly declared as supported
+
 	pcmv=		[HW,PCMCIA] BadgePAD 4
 
 	pd_ignore_unused
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index a8124e47bf6e..d11909e79263 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1688,6 +1688,8 @@ static int __init pci_driver_init(void)
 {
 	int ret;
 
+	pci_add_flags(PCI_IMMOVABLE_BARS);
+
 	ret = bus_register(&pci_bus_type);
 	if (ret)
 		return ret;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 61d951766087..3a504f58ac60 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
 }
 __setup("pcie_port_pm=", pcie_port_pm_setup);
 
+static bool pcie_movable_bars_off;
+static bool pcie_movable_bars_force;
+static int __init pcie_movable_bars_setup(char *str)
+{
+	if (!strcmp(str, "off"))
+		pcie_movable_bars_off = true;
+	else if (!strcmp(str, "force"))
+		pcie_movable_bars_force = true;
+	return 1;
+}
+__setup("pcie_movable_bars=", pcie_movable_bars_setup);
+
+bool pci_movable_bars_enabled(void)
+{
+	if (pcie_movable_bars_off)
+		return false;
+
+	if (pcie_movable_bars_force)
+		return true;
+
+	return !pci_has_flag(PCI_IMMOVABLE_BARS);
+}
+EXPORT_SYMBOL(pci_movable_bars_enabled);
+
 /* Time to wait after a reset for device to become responsive */
 #define PCIE_RESET_READY_POLL_MS 60000
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d22d1b807701..be7acc477c64 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -257,6 +257,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
 
+bool pci_dev_movable_bars_supported(struct pci_dev *dev);
+
 /* PCIe link information */
 #define PCIE_SPEED2STR(speed) \
 	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2e58ece820e8..60e3b48d2251 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3406,6 +3406,74 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 	return max;
 }
 
+bool pci_dev_movable_bars_supported(struct pci_dev *dev)
+{
+	if (!dev)
+		return false;
+
+	if (dev->driver && dev->driver->rescan_prepare)
+		return true;
+
+	if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
+		return false;
+
+	return !dev->driver;
+}
+
+void __weak pcibios_rescan_prepare(struct pci_dev *dev)
+{
+}
+
+void __weak pcibios_rescan_done(struct pci_dev *dev)
+{
+}
+
+static void pci_bus_rescan_prepare(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (bus->self)
+		pci_config_pm_runtime_get(bus->self);
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (child)
+			pci_bus_rescan_prepare(child);
+
+		if (dev->driver &&
+		    dev->driver->rescan_prepare) {
+			dev->driver->rescan_prepare(dev);
+			pcibios_rescan_prepare(dev);
+		} else if (pci_dev_movable_bars_supported(dev)) {
+			pcibios_rescan_prepare(dev);
+		}
+	}
+}
+
+static void pci_bus_rescan_done(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (dev->driver &&
+		    dev->driver->rescan_done) {
+			pcibios_rescan_done(dev);
+			dev->driver->rescan_done(dev);
+		} else if (pci_dev_movable_bars_supported(dev)) {
+			pcibios_rescan_done(dev);
+		}
+
+		if (child)
+			pci_bus_rescan_done(child);
+	}
+
+	if (bus->self)
+		pci_config_pm_runtime_put(bus->self);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3418,9 +3486,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
+	struct pci_bus *root = bus;
+
+	while (!pci_is_root_bus(root))
+		root = root->parent;
+
+	if (pci_movable_bars_enabled()) {
+		pci_bus_rescan_prepare(root);
+
+		max = pci_scan_child_bus(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		pci_bus_rescan_done(root);
+	} else {
+		max = pci_scan_child_bus(bus);
+		pci_assign_unassigned_bus_resources(bus);
+	}
 
-	max = pci_scan_child_bus(bus);
-	pci_assign_unassigned_bus_resources(bus);
 	pci_bus_add_devices(bus);
 
 	return max;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d3a72159722d..e5b5eff05744 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -838,6 +838,8 @@ struct pci_driver {
 	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
 	void (*shutdown)(struct pci_dev *dev);
 	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
+	void (*rescan_prepare)(struct pci_dev *dev);
+	void (*rescan_done)(struct pci_dev *dev);
 	const struct pci_error_handlers *err_handler;
 	const struct attribute_group **groups;
 	struct device_driver	driver;
@@ -924,6 +926,7 @@ enum {
 	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
 	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
 	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
+	PCI_IMMOVABLE_BARS	= 0x00000080,	/* Disable runtime BAR reassign */
 };
 
 /* These external functions are only available when PCI support is enabled */
@@ -1266,6 +1269,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus);
 void pci_lock_rescan_remove(void);
 void pci_unlock_rescan_remove(void);
 
+void pcibios_rescan_prepare(struct pci_dev *dev);
+void pcibios_rescan_done(struct pci_dev *dev);
+
 /* Vital Product Data routines */
 ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
 ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
@@ -1402,6 +1408,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 					 unsigned long type);
+bool pci_movable_bars_enabled(void);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: David Laight, Sam Bobroff, Sergey Miroshnichenko, linux,
	Lukas Wunner, Bjorn Helgaas, Oliver O'Halloran, Rajat Jain

When hot-adding a device, the bridge may have windows not big enough (or
fragmented too much) for newly requested BARs to fit in. And expanding
these bridge windows may be impossible because blocked by "neighboring"
BARs and bridge windows.

Still, it may be possible to allocate a memory region for new BARs with the
following procedure:

1) notify all the drivers which support movable BARs to pause and release
   the BARs; the rest of the drivers are guaranteed that their devices will
   not get BARs moved;

2) release all the bridge windows except of root bridges;

3) try to recalculate new bridge windows that will fit all the BAR types:
   - fixed;
   - immovable;
   - movable;
   - newly requested by hot-added devices;

4) if the previous step fails, disable BARs for one of the hot-added
   devices and retry from step 3;

5) notify the drivers, so they remap BARs and resume.

This makes the prior reservation of memory by BIOS/bootloader/firmware not
required anymore for the PCI hotplug.

Drivers indicate their support of movable BARs by implementing the new
.rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
device's activity must be paused during a rescan, and iounmap()+ioremap()
must be applied to every used BAR.

The platform also may need to prepare to BAR movement, so new hooks added:
pcibios_rescan_prepare(pci_dev) and pcibios_rescan_prepare(pci_dev).

This patch is a preparation for future patches with actual implementation,
and for now it just does the following:
 - declares the feature;
 - defines pci_movable_bars_enabled(), pci_dev_movable_bars_supported(dev);
 - invokes the .rescan_prepare() and .rescan_done() driver notifiers;
 - declares and invokes the pcibios_rescan_prepare()/_done() hooks;
 - adds the PCI_IMMOVABLE_BARS flag.

The feature is disabled by default (via PCI_IMMOVABLE_BARS) until the final
patch of the series. It can be overridden per-arch using this flag or by
the following command line option:

    pcie_movable_bars={ off | force }

CC: Sam Bobroff <sbobroff@linux.ibm.com>
CC: Rajat Jain <rajatja@google.com>
CC: Lukas Wunner <lukas@wunner.de>
CC: Oliver O'Halloran <oohall@gmail.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 .../admin-guide/kernel-parameters.txt         |  7 ++
 drivers/pci/pci-driver.c                      |  2 +
 drivers/pci/pci.c                             | 24 ++++++
 drivers/pci/pci.h                             |  2 +
 drivers/pci/probe.c                           | 86 ++++++++++++++++++-
 include/linux/pci.h                           |  7 ++
 6 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 47d981a86e2f..e2274ee87a35 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3526,6 +3526,13 @@
 		nomsi	Do not use MSI for native PCIe PME signaling (this makes
 			all PCIe root ports use INTx for all services).
 
+	pcie_movable_bars=[PCIE]
+			Override the movable BARs support detection:
+		off
+			Disable even if supported by the platform
+		force
+			Enable even if not explicitly declared as supported
+
 	pcmv=		[HW,PCMCIA] BadgePAD 4
 
 	pd_ignore_unused
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index a8124e47bf6e..d11909e79263 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1688,6 +1688,8 @@ static int __init pci_driver_init(void)
 {
 	int ret;
 
+	pci_add_flags(PCI_IMMOVABLE_BARS);
+
 	ret = bus_register(&pci_bus_type);
 	if (ret)
 		return ret;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 61d951766087..3a504f58ac60 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
 }
 __setup("pcie_port_pm=", pcie_port_pm_setup);
 
+static bool pcie_movable_bars_off;
+static bool pcie_movable_bars_force;
+static int __init pcie_movable_bars_setup(char *str)
+{
+	if (!strcmp(str, "off"))
+		pcie_movable_bars_off = true;
+	else if (!strcmp(str, "force"))
+		pcie_movable_bars_force = true;
+	return 1;
+}
+__setup("pcie_movable_bars=", pcie_movable_bars_setup);
+
+bool pci_movable_bars_enabled(void)
+{
+	if (pcie_movable_bars_off)
+		return false;
+
+	if (pcie_movable_bars_force)
+		return true;
+
+	return !pci_has_flag(PCI_IMMOVABLE_BARS);
+}
+EXPORT_SYMBOL(pci_movable_bars_enabled);
+
 /* Time to wait after a reset for device to become responsive */
 #define PCIE_RESET_READY_POLL_MS 60000
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d22d1b807701..be7acc477c64 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -257,6 +257,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
 
+bool pci_dev_movable_bars_supported(struct pci_dev *dev);
+
 /* PCIe link information */
 #define PCIE_SPEED2STR(speed) \
 	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2e58ece820e8..60e3b48d2251 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3406,6 +3406,74 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 	return max;
 }
 
+bool pci_dev_movable_bars_supported(struct pci_dev *dev)
+{
+	if (!dev)
+		return false;
+
+	if (dev->driver && dev->driver->rescan_prepare)
+		return true;
+
+	if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
+		return false;
+
+	return !dev->driver;
+}
+
+void __weak pcibios_rescan_prepare(struct pci_dev *dev)
+{
+}
+
+void __weak pcibios_rescan_done(struct pci_dev *dev)
+{
+}
+
+static void pci_bus_rescan_prepare(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (bus->self)
+		pci_config_pm_runtime_get(bus->self);
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (child)
+			pci_bus_rescan_prepare(child);
+
+		if (dev->driver &&
+		    dev->driver->rescan_prepare) {
+			dev->driver->rescan_prepare(dev);
+			pcibios_rescan_prepare(dev);
+		} else if (pci_dev_movable_bars_supported(dev)) {
+			pcibios_rescan_prepare(dev);
+		}
+	}
+}
+
+static void pci_bus_rescan_done(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (dev->driver &&
+		    dev->driver->rescan_done) {
+			pcibios_rescan_done(dev);
+			dev->driver->rescan_done(dev);
+		} else if (pci_dev_movable_bars_supported(dev)) {
+			pcibios_rescan_done(dev);
+		}
+
+		if (child)
+			pci_bus_rescan_done(child);
+	}
+
+	if (bus->self)
+		pci_config_pm_runtime_put(bus->self);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3418,9 +3486,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
+	struct pci_bus *root = bus;
+
+	while (!pci_is_root_bus(root))
+		root = root->parent;
+
+	if (pci_movable_bars_enabled()) {
+		pci_bus_rescan_prepare(root);
+
+		max = pci_scan_child_bus(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		pci_bus_rescan_done(root);
+	} else {
+		max = pci_scan_child_bus(bus);
+		pci_assign_unassigned_bus_resources(bus);
+	}
 
-	max = pci_scan_child_bus(bus);
-	pci_assign_unassigned_bus_resources(bus);
 	pci_bus_add_devices(bus);
 
 	return max;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d3a72159722d..e5b5eff05744 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -838,6 +838,8 @@ struct pci_driver {
 	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
 	void (*shutdown)(struct pci_dev *dev);
 	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
+	void (*rescan_prepare)(struct pci_dev *dev);
+	void (*rescan_done)(struct pci_dev *dev);
 	const struct pci_error_handlers *err_handler;
 	const struct attribute_group **groups;
 	struct device_driver	driver;
@@ -924,6 +926,7 @@ enum {
 	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
 	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
 	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
+	PCI_IMMOVABLE_BARS	= 0x00000080,	/* Disable runtime BAR reassign */
 };
 
 /* These external functions are only available when PCI support is enabled */
@@ -1266,6 +1269,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus);
 void pci_lock_rescan_remove(void);
 void pci_unlock_rescan_remove(void);
 
+void pcibios_rescan_prepare(struct pci_dev *dev);
+void pcibios_rescan_done(struct pci_dev *dev);
+
 /* Vital Product Data routines */
 ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
 ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
@@ -1402,6 +1408,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 					 unsigned long type);
+bool pci_movable_bars_enabled(void);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 04/23] PCI: Define PCI-specific version of the release_child_resources()
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

If release the bridge resources with standard release_child_resources(), it
drops the .start field of children's BARs to zero, but with the STARTALIGN
flag remaining set, which makes the resource invalid for reassignment.

Some resources must preserve their offset and size: those marked with the
PCI_FIXED and the immovable ones - which are bound by drivers without
support of the movable BARs feature.

Add the pci_release_child_resources() to replace release_child_resources()
in handling the described PCI-specific cases.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 54 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 79b1fa6519be..6cb8b293c576 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1482,6 +1482,55 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
 	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
 	 IORESOURCE_MEM_64)
 
+/*
+ * Similar to generic release_child_resources(), but aware of immovable BARs and
+ * PCI_FIXED and STARTALIGN flags
+ */
+static void pci_release_child_resources(struct pci_bus *bus, struct resource *r)
+{
+	struct pci_dev *dev;
+
+	if (!bus || !r)
+		return;
+
+	if (r->flags & IORESOURCE_PCI_FIXED)
+		return;
+
+	r->child = NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+
+		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+			struct resource *tmp = &dev->resource[i];
+			resource_size_t size = resource_size(tmp);
+
+			if (!tmp->flags || tmp->parent != r)
+				continue;
+
+			tmp->parent = NULL;
+			tmp->sibling = NULL;
+
+			pci_release_child_resources(dev->subordinate, tmp);
+
+			if ((tmp->flags & IORESOURCE_PCI_FIXED) ||
+			    !pci_dev_movable_bars_supported(dev)) {
+				pci_dbg(dev, "release immovable %pR (%s), keep its flags, base and size\n",
+					tmp, tmp->name);
+				continue;
+			}
+
+			pci_dbg(dev, "release %pR (%s)\n", tmp, tmp->name);
+
+			tmp->start = 0;
+			tmp->end = size - 1;
+
+			tmp->flags &= ~IORESOURCE_STARTALIGN;
+			tmp->flags |= IORESOURCE_SIZEALIGN;
+		}
+	}
+}
+
 static void pci_bridge_release_resources(struct pci_bus *bus,
 					 unsigned long type)
 {
@@ -1522,7 +1571,10 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		return;
 
 	/* If there are children, release them all */
-	release_child_resources(r);
+	if (pci_movable_bars_enabled())
+		pci_release_child_resources(bus, r);
+	else
+		release_child_resources(r);
 	if (!release_resource(r)) {
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_info(dev, "resource %d %pR released\n",
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 04/23] PCI: Define PCI-specific version of the release_child_resources()
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

If release the bridge resources with standard release_child_resources(), it
drops the .start field of children's BARs to zero, but with the STARTALIGN
flag remaining set, which makes the resource invalid for reassignment.

Some resources must preserve their offset and size: those marked with the
PCI_FIXED and the immovable ones - which are bound by drivers without
support of the movable BARs feature.

Add the pci_release_child_resources() to replace release_child_resources()
in handling the described PCI-specific cases.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 54 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 79b1fa6519be..6cb8b293c576 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1482,6 +1482,55 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
 	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
 	 IORESOURCE_MEM_64)
 
+/*
+ * Similar to generic release_child_resources(), but aware of immovable BARs and
+ * PCI_FIXED and STARTALIGN flags
+ */
+static void pci_release_child_resources(struct pci_bus *bus, struct resource *r)
+{
+	struct pci_dev *dev;
+
+	if (!bus || !r)
+		return;
+
+	if (r->flags & IORESOURCE_PCI_FIXED)
+		return;
+
+	r->child = NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+
+		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+			struct resource *tmp = &dev->resource[i];
+			resource_size_t size = resource_size(tmp);
+
+			if (!tmp->flags || tmp->parent != r)
+				continue;
+
+			tmp->parent = NULL;
+			tmp->sibling = NULL;
+
+			pci_release_child_resources(dev->subordinate, tmp);
+
+			if ((tmp->flags & IORESOURCE_PCI_FIXED) ||
+			    !pci_dev_movable_bars_supported(dev)) {
+				pci_dbg(dev, "release immovable %pR (%s), keep its flags, base and size\n",
+					tmp, tmp->name);
+				continue;
+			}
+
+			pci_dbg(dev, "release %pR (%s)\n", tmp, tmp->name);
+
+			tmp->start = 0;
+			tmp->end = size - 1;
+
+			tmp->flags &= ~IORESOURCE_STARTALIGN;
+			tmp->flags |= IORESOURCE_SIZEALIGN;
+		}
+	}
+}
+
 static void pci_bridge_release_resources(struct pci_bus *bus,
 					 unsigned long type)
 {
@@ -1522,7 +1571,10 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		return;
 
 	/* If there are children, release them all */
-	release_child_resources(r);
+	if (pci_movable_bars_enabled())
+		pci_release_child_resources(bus, r);
+	else
+		release_child_resources(r);
 	if (!release_resource(r)) {
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_info(dev, "resource %d %pR released\n",
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 05/23] PCI: hotplug: movable BARs: Fix reassigning the released bridge windows
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

When a bridge window is temporarily released during the rescan, its old
size is not relevant anymore - it will be recreated from pbus_size_*(), so
it's start value should be zero.

If such window can't be reassigned, don't apply reset_resource(), so the
next retry may succeed.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6cb8b293c576..7c2c57f77c6f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -295,7 +295,8 @@ static void assign_requested_resources_sorted(struct list_head *head,
 						    0 /* don't care */,
 						    0 /* don't care */);
 			}
-			reset_resource(res);
+			if (!pci_movable_bars_enabled())
+				reset_resource(res);
 		}
 	}
 }
@@ -1579,8 +1580,8 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_info(dev, "resource %d %pR released\n",
 			 PCI_BRIDGE_RESOURCES + idx, r);
-		/* Keep the old size */
-		r->end = resource_size(r) - 1;
+		/* Don't keep the old size if the bridge will be recalculated */
+		r->end = pci_movable_bars_enabled() ? 0 : (resource_size(r) - 1);
 		r->start = 0;
 		r->flags = 0;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 05/23] PCI: hotplug: movable BARs: Fix reassigning the released bridge windows
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

When a bridge window is temporarily released during the rescan, its old
size is not relevant anymore - it will be recreated from pbus_size_*(), so
it's start value should be zero.

If such window can't be reassigned, don't apply reset_resource(), so the
next retry may succeed.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6cb8b293c576..7c2c57f77c6f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -295,7 +295,8 @@ static void assign_requested_resources_sorted(struct list_head *head,
 						    0 /* don't care */,
 						    0 /* don't care */);
 			}
-			reset_resource(res);
+			if (!pci_movable_bars_enabled())
+				reset_resource(res);
 		}
 	}
 }
@@ -1579,8 +1580,8 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_info(dev, "resource %d %pR released\n",
 			 PCI_BRIDGE_RESOURCES + idx, r);
-		/* Keep the old size */
-		r->end = resource_size(r) - 1;
+		/* Don't keep the old size if the bridge will be recalculated */
+		r->end = pci_movable_bars_enabled() ? 0 : (resource_size(r) - 1);
 		r->start = 0;
 		r->flags = 0;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 06/23] PCI: hotplug: movable BARs: Recalculate all bridge windows during rescan
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

When the movable BARs feature is enabled and a rescan has been requested,
release all the bridge windows and recalculate them from scratch, taking
into account all kinds for BARs: fixed, immovable, movable, new.

This increases the chances to find a memory space to fit BARs for newly
hotplugged devices, especially if no/not enough gaps were reserved by the
BIOS/bootloader/firmware.

The last step of writing the recalculated windows to the bridges is done
by the new pci_setup_bridges() function.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  1 +
 drivers/pci/probe.c     | 22 ++++++++++++++++++++++
 drivers/pci/setup-bus.c | 16 ++++++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index be7acc477c64..a0ec696512eb 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -253,6 +253,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 				struct list_head *realloc_head,
 				struct list_head *fail_head);
 bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
+void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
 
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 60e3b48d2251..a26bf740e9ab 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3474,6 +3474,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
 		pci_config_pm_runtime_put(bus->self);
 }
 
+static void pci_setup_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child;
+
+		if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
+			continue;
+
+		child = dev->subordinate;
+		if (child)
+			pci_setup_bridges(child);
+	}
+
+	if (bus->self)
+		pci_setup_bridge(bus);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3495,8 +3514,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+
+		pci_bus_release_root_bridge_resources(root);
 		pci_assign_unassigned_root_bus_resources(root);
 
+		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
 	} else {
 		max = pci_scan_child_bus(bus);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7c2c57f77c6f..04f626e1ac18 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1635,6 +1635,22 @@ static void pci_bus_release_bridge_resources(struct pci_bus *bus,
 		pci_bridge_release_resources(bus, type);
 }
 
+void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
+{
+	int i;
+	struct resource *r;
+
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus,
+					 IORESOURCE_MEM_64 | IORESOURCE_PREFETCH,
+					 whole_subtree);
+
+	pci_bus_for_each_resource(root_bus, r, i) {
+		pci_release_child_resources(root_bus, r);
+	}
+}
+
 static void pci_bus_dump_res(struct pci_bus *bus)
 {
 	struct resource *res;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 06/23] PCI: hotplug: movable BARs: Recalculate all bridge windows during rescan
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

When the movable BARs feature is enabled and a rescan has been requested,
release all the bridge windows and recalculate them from scratch, taking
into account all kinds for BARs: fixed, immovable, movable, new.

This increases the chances to find a memory space to fit BARs for newly
hotplugged devices, especially if no/not enough gaps were reserved by the
BIOS/bootloader/firmware.

The last step of writing the recalculated windows to the bridges is done
by the new pci_setup_bridges() function.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  1 +
 drivers/pci/probe.c     | 22 ++++++++++++++++++++++
 drivers/pci/setup-bus.c | 16 ++++++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index be7acc477c64..a0ec696512eb 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -253,6 +253,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 				struct list_head *realloc_head,
 				struct list_head *fail_head);
 bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
+void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
 
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 60e3b48d2251..a26bf740e9ab 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3474,6 +3474,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
 		pci_config_pm_runtime_put(bus->self);
 }
 
+static void pci_setup_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child;
+
+		if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
+			continue;
+
+		child = dev->subordinate;
+		if (child)
+			pci_setup_bridges(child);
+	}
+
+	if (bus->self)
+		pci_setup_bridge(bus);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3495,8 +3514,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+
+		pci_bus_release_root_bridge_resources(root);
 		pci_assign_unassigned_root_bus_resources(root);
 
+		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
 	} else {
 		max = pci_scan_child_bus(bus);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7c2c57f77c6f..04f626e1ac18 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1635,6 +1635,22 @@ static void pci_bus_release_bridge_resources(struct pci_bus *bus,
 		pci_bridge_release_resources(bus, type);
 }
 
+void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
+{
+	int i;
+	struct resource *r;
+
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus,
+					 IORESOURCE_MEM_64 | IORESOURCE_PREFETCH,
+					 whole_subtree);
+
+	pci_bus_for_each_resource(root_bus, r, i) {
+		pci_release_child_resources(root_bus, r);
+	}
+}
+
 static void pci_bus_dump_res(struct pci_bus *bus)
 {
 	struct resource *res;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 07/23] PCI: hotplug: movable BARs: Don't allow added devices to steal resources
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

When movable BARs are enabled, the PCI subsystem at first releases all the
bridge windows and then attempts to assign resources both to previously
working devices and to the newly hotplugged ones, with the same priority.

If a hotplugged device gets its BARs first, this may lead to lack of space
for already working devices, which is unacceptable. If that happens, mark
one of the new devices with the newly introduced flag PCI_DEV_DISABLED_BARS
(if it is not yet marked) and retry the BAR recalculation.

The worst case would be no BARs for hotplugged devices, while all the rest
just continue working.

The algorithm is simple and it doesn't retry different subsets of hot-added
devices in case of a failure, e.g. if there are no space to allocate BARs
for both hotplugged devices A and B, but is enough for just A, the A will
be marked with PCI_DEV_DISABLED_BARS first, then (after the next failure) -
B. As a result, A will not get BARs while it could. This issue is only
relevant when hotplugging two and more devices simultaneously.

Add a new res_mask bitmask to the struct pci_dev for storing the indices of
assigned BARs.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  11 +++++
 drivers/pci/probe.c     | 101 ++++++++++++++++++++++++++++++++++++++--
 drivers/pci/setup-bus.c |  15 ++++++
 include/linux/pci.h     |   1 +
 4 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a0ec696512eb..53249cbc21b6 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -373,6 +373,7 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
+#define PCI_DEV_DISABLED_BARS 1
 
 static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
 {
@@ -384,6 +385,16 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
 	return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
 }
 
+static inline void pci_dev_disable_bars(struct pci_dev *dev)
+{
+	assign_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags, true);
+}
+
+static inline bool pci_dev_bars_enabled(const struct pci_dev *dev)
+{
+	return !test_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags);
+}
+
 #ifdef CONFIG_PCIEAER
 #include <linux/aer.h>
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a26bf740e9ab..bf0a7d1c5d09 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3428,6 +3428,23 @@ void __weak pcibios_rescan_done(struct pci_dev *dev)
 {
 }
 
+static unsigned int pci_dev_count_res_mask(struct pci_dev *dev)
+{
+	unsigned int res_mask = 0;
+	int i;
+
+	for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
+		struct resource *r = &dev->resource[i];
+
+		if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+			continue;
+
+		res_mask |= (1 << i);
+	}
+
+	return res_mask;
+}
+
 static void pci_bus_rescan_prepare(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3438,6 +3455,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child = dev->subordinate;
 
+		dev->res_mask = pci_dev_count_res_mask(dev);
+
 		if (child)
 			pci_bus_rescan_prepare(child);
 
@@ -3481,7 +3500,7 @@ static void pci_setup_bridges(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child;
 
-		if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
+		if (!pci_dev_is_added(dev) || !pci_dev_bars_enabled(dev))
 			continue;
 
 		child = dev->subordinate;
@@ -3493,6 +3512,83 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (!bus)
+		return NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child_bus = dev->subordinate;
+
+		if (!pci_dev_is_added(dev) && pci_dev_bars_enabled(dev))
+			return dev;
+
+		if (child_bus) {
+			struct pci_dev *next_new_dev;
+
+			next_new_dev = pci_find_next_new_device(child_bus);
+			if (next_new_dev)
+				return next_new_dev;
+		}
+	}
+
+	return NULL;
+}
+
+static bool pci_bus_check_all_bars_reassigned(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	bool ret = true;
+
+	if (!bus)
+		return false;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+		unsigned int res_mask = pci_dev_count_res_mask(dev);
+
+		if (!pci_dev_bars_enabled(dev))
+			continue;
+
+		if (dev->res_mask & ~res_mask) {
+			pci_err(dev, "Non-re-enabled resources found: 0x%x -> 0x%x\n",
+				dev->res_mask, res_mask);
+			ret = false;
+		}
+
+		if (child && !pci_bus_check_all_bars_reassigned(child))
+			ret = false;
+	}
+
+	return ret;
+}
+
+static void pci_reassign_root_bus_resources(struct pci_bus *root)
+{
+	do {
+		struct pci_dev *next_new_dev;
+
+		pci_bus_release_root_bridge_resources(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		if (pci_bus_check_all_bars_reassigned(root))
+			break;
+
+		next_new_dev = pci_find_next_new_device(root);
+		if (!next_new_dev) {
+			dev_err(&root->dev, "failed to re-assign resources even after ignoring all the hotplugged devices\n");
+			break;
+		}
+
+		dev_warn(&root->dev, "failed to re-assign resources, disable the next hotplugged device %s and retry\n",
+			 dev_name(&next_new_dev->dev));
+
+		pci_dev_disable_bars(next_new_dev);
+	} while (true);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3515,8 +3611,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 
 		max = pci_scan_child_bus(root);
 
-		pci_bus_release_root_bridge_resources(root);
-		pci_assign_unassigned_root_bus_resources(root);
+		pci_reassign_root_bus_resources(root);
 
 		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 04f626e1ac18..1a731002ce18 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -128,6 +128,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
 	int i;
 
+	if (!pci_dev_bars_enabled(dev))
+		return;
+
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *r;
 		struct pci_dev_resource *dev_res, *tmp;
@@ -177,6 +180,9 @@ static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
 	u16 class = dev->class >> 8;
 
+	if (!pci_dev_bars_enabled(dev))
+		return;
+
 	/* Don't touch classless devices or host bridges or IOAPICs */
 	if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
 		return;
@@ -278,6 +284,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
 	int idx;
 
 	list_for_each_entry(dev_res, head, list) {
+		if (!pci_dev_bars_enabled(dev_res->dev))
+			continue;
+
 		res = dev_res->res;
 		idx = res - &dev_res->dev->resource[0];
 		if (resource_size(res) &&
@@ -995,6 +1004,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
+		if (!pci_dev_bars_enabled(dev))
+			continue;
+
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
@@ -1349,6 +1361,9 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 	pbus_assign_resources_sorted(bus, realloc_head, fail_head);
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
+		if (!pci_dev_bars_enabled(dev))
+			continue;
+
 		pdev_assign_fixed_resources(dev);
 
 		b = dev->subordinate;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e5b5eff05744..95a8113c2157 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -375,6 +375,7 @@ struct pci_dev {
 	 */
 	unsigned int	irq;
 	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
+	unsigned int	res_mask;		/* Bitmask of assigned resources */
 
 	bool		match_driver;		/* Skip attaching driver */
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 07/23] PCI: hotplug: movable BARs: Don't allow added devices to steal resources
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

When movable BARs are enabled, the PCI subsystem at first releases all the
bridge windows and then attempts to assign resources both to previously
working devices and to the newly hotplugged ones, with the same priority.

If a hotplugged device gets its BARs first, this may lead to lack of space
for already working devices, which is unacceptable. If that happens, mark
one of the new devices with the newly introduced flag PCI_DEV_DISABLED_BARS
(if it is not yet marked) and retry the BAR recalculation.

The worst case would be no BARs for hotplugged devices, while all the rest
just continue working.

The algorithm is simple and it doesn't retry different subsets of hot-added
devices in case of a failure, e.g. if there are no space to allocate BARs
for both hotplugged devices A and B, but is enough for just A, the A will
be marked with PCI_DEV_DISABLED_BARS first, then (after the next failure) -
B. As a result, A will not get BARs while it could. This issue is only
relevant when hotplugging two and more devices simultaneously.

Add a new res_mask bitmask to the struct pci_dev for storing the indices of
assigned BARs.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  11 +++++
 drivers/pci/probe.c     | 101 ++++++++++++++++++++++++++++++++++++++--
 drivers/pci/setup-bus.c |  15 ++++++
 include/linux/pci.h     |   1 +
 4 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a0ec696512eb..53249cbc21b6 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -373,6 +373,7 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
+#define PCI_DEV_DISABLED_BARS 1
 
 static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
 {
@@ -384,6 +385,16 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
 	return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
 }
 
+static inline void pci_dev_disable_bars(struct pci_dev *dev)
+{
+	assign_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags, true);
+}
+
+static inline bool pci_dev_bars_enabled(const struct pci_dev *dev)
+{
+	return !test_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags);
+}
+
 #ifdef CONFIG_PCIEAER
 #include <linux/aer.h>
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a26bf740e9ab..bf0a7d1c5d09 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3428,6 +3428,23 @@ void __weak pcibios_rescan_done(struct pci_dev *dev)
 {
 }
 
+static unsigned int pci_dev_count_res_mask(struct pci_dev *dev)
+{
+	unsigned int res_mask = 0;
+	int i;
+
+	for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
+		struct resource *r = &dev->resource[i];
+
+		if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+			continue;
+
+		res_mask |= (1 << i);
+	}
+
+	return res_mask;
+}
+
 static void pci_bus_rescan_prepare(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3438,6 +3455,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child = dev->subordinate;
 
+		dev->res_mask = pci_dev_count_res_mask(dev);
+
 		if (child)
 			pci_bus_rescan_prepare(child);
 
@@ -3481,7 +3500,7 @@ static void pci_setup_bridges(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child;
 
-		if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
+		if (!pci_dev_is_added(dev) || !pci_dev_bars_enabled(dev))
 			continue;
 
 		child = dev->subordinate;
@@ -3493,6 +3512,83 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (!bus)
+		return NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child_bus = dev->subordinate;
+
+		if (!pci_dev_is_added(dev) && pci_dev_bars_enabled(dev))
+			return dev;
+
+		if (child_bus) {
+			struct pci_dev *next_new_dev;
+
+			next_new_dev = pci_find_next_new_device(child_bus);
+			if (next_new_dev)
+				return next_new_dev;
+		}
+	}
+
+	return NULL;
+}
+
+static bool pci_bus_check_all_bars_reassigned(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	bool ret = true;
+
+	if (!bus)
+		return false;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+		unsigned int res_mask = pci_dev_count_res_mask(dev);
+
+		if (!pci_dev_bars_enabled(dev))
+			continue;
+
+		if (dev->res_mask & ~res_mask) {
+			pci_err(dev, "Non-re-enabled resources found: 0x%x -> 0x%x\n",
+				dev->res_mask, res_mask);
+			ret = false;
+		}
+
+		if (child && !pci_bus_check_all_bars_reassigned(child))
+			ret = false;
+	}
+
+	return ret;
+}
+
+static void pci_reassign_root_bus_resources(struct pci_bus *root)
+{
+	do {
+		struct pci_dev *next_new_dev;
+
+		pci_bus_release_root_bridge_resources(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		if (pci_bus_check_all_bars_reassigned(root))
+			break;
+
+		next_new_dev = pci_find_next_new_device(root);
+		if (!next_new_dev) {
+			dev_err(&root->dev, "failed to re-assign resources even after ignoring all the hotplugged devices\n");
+			break;
+		}
+
+		dev_warn(&root->dev, "failed to re-assign resources, disable the next hotplugged device %s and retry\n",
+			 dev_name(&next_new_dev->dev));
+
+		pci_dev_disable_bars(next_new_dev);
+	} while (true);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3515,8 +3611,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 
 		max = pci_scan_child_bus(root);
 
-		pci_bus_release_root_bridge_resources(root);
-		pci_assign_unassigned_root_bus_resources(root);
+		pci_reassign_root_bus_resources(root);
 
 		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 04f626e1ac18..1a731002ce18 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -128,6 +128,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
 	int i;
 
+	if (!pci_dev_bars_enabled(dev))
+		return;
+
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *r;
 		struct pci_dev_resource *dev_res, *tmp;
@@ -177,6 +180,9 @@ static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
 	u16 class = dev->class >> 8;
 
+	if (!pci_dev_bars_enabled(dev))
+		return;
+
 	/* Don't touch classless devices or host bridges or IOAPICs */
 	if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
 		return;
@@ -278,6 +284,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
 	int idx;
 
 	list_for_each_entry(dev_res, head, list) {
+		if (!pci_dev_bars_enabled(dev_res->dev))
+			continue;
+
 		res = dev_res->res;
 		idx = res - &dev_res->dev->resource[0];
 		if (resource_size(res) &&
@@ -995,6 +1004,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
+		if (!pci_dev_bars_enabled(dev))
+			continue;
+
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
@@ -1349,6 +1361,9 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 	pbus_assign_resources_sorted(bus, realloc_head, fail_head);
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
+		if (!pci_dev_bars_enabled(dev))
+			continue;
+
 		pdev_assign_fixed_resources(dev);
 
 		b = dev->subordinate;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e5b5eff05744..95a8113c2157 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -375,6 +375,7 @@ struct pci_dev {
 	 */
 	unsigned int	irq;
 	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
+	unsigned int	res_mask;		/* Bitmask of assigned resources */
 
 	bool		match_driver;		/* Skip attaching driver */
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 08/23] PCI: Include fixed and immovable BARs into the bus size calculating
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

The only difference between the fixed/immovable and movable BARs is a size
and offset preservation after they are released (the corresponding struct
resource* detached from a bridge window for a while during a bus rescan).

Include fixed/immovable BARs into result of pbus_size_mem() and prohibit
assigning them to non-direct parents.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1a731002ce18..2c250efca512 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1011,12 +1011,21 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
 
-			if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
+			if (r->parent ||
 			    ((r->flags & mask) != type &&
 			     (r->flags & mask) != type2 &&
 			     (r->flags & mask) != type3))
 				continue;
 			r_size = resource_size(r);
+
+			if ((r->flags & IORESOURCE_PCI_FIXED) ||
+			    !pci_dev_movable_bars_supported(dev)) {
+				if (pci_movable_bars_enabled())
+					size += r_size;
+
+				continue;
+			}
+
 #ifdef CONFIG_PCI_IOV
 			/* Put SRIOV requested res to the optional list */
 			if (realloc_head && i >= PCI_IOV_RESOURCES &&
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 08/23] PCI: Include fixed and immovable BARs into the bus size calculating
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

The only difference between the fixed/immovable and movable BARs is a size
and offset preservation after they are released (the corresponding struct
resource* detached from a bridge window for a while during a bus rescan).

Include fixed/immovable BARs into result of pbus_size_mem() and prohibit
assigning them to non-direct parents.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1a731002ce18..2c250efca512 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1011,12 +1011,21 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
 
-			if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
+			if (r->parent ||
 			    ((r->flags & mask) != type &&
 			     (r->flags & mask) != type2 &&
 			     (r->flags & mask) != type3))
 				continue;
 			r_size = resource_size(r);
+
+			if ((r->flags & IORESOURCE_PCI_FIXED) ||
+			    !pci_dev_movable_bars_supported(dev)) {
+				if (pci_movable_bars_enabled())
+					size += r_size;
+
+				continue;
+			}
+
 #ifdef CONFIG_PCI_IOV
 			/* Put SRIOV requested res to the optional list */
 			if (realloc_head && i >= PCI_IOV_RESOURCES &&
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 09/23] PCI: Prohibit assigning BARs and bridge windows to non-direct parents
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

When movable BARs are enabled, the feature of resource relocating from
commit 2bbc6942273b5 ("PCI : ability to relocate assigned pci-resources")
is not used. Instead, inability to assign a resource is used as a signal
to retry BAR assignment with other configuration of bridge windows.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c |  2 ++
 drivers/pci/setup-res.c | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 2c250efca512..aee330047121 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1356,6 +1356,8 @@ static void pdev_assign_fixed_resources(struct pci_dev *dev)
 		while (b && !r->parent) {
 			assign_fixed_resource_on_bus(b, r);
 			b = b->parent;
+			if (!r->parent && pci_movable_bars_enabled())
+				break;
 		}
 	}
 }
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index d8ca40a97693..732d18f60f1b 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -298,6 +298,18 @@ static int _pci_assign_resource(struct pci_dev *dev, int resno,
 
 	bus = dev->bus;
 	while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) {
+		if (pci_movable_bars_enabled()) {
+			if (resno >= PCI_BRIDGE_RESOURCES &&
+			    resno <= PCI_BRIDGE_RESOURCE_END) {
+				struct resource *res = dev->resource + resno;
+
+				res->start = 0;
+				res->end = 0;
+				res->flags = 0;
+			}
+			break;
+		}
+
 		if (!bus->parent || !bus->self->transparent)
 			break;
 		bus = bus->parent;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 09/23] PCI: Prohibit assigning BARs and bridge windows to non-direct parents
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

When movable BARs are enabled, the feature of resource relocating from
commit 2bbc6942273b5 ("PCI : ability to relocate assigned pci-resources")
is not used. Instead, inability to assign a resource is used as a signal
to retry BAR assignment with other configuration of bridge windows.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c |  2 ++
 drivers/pci/setup-res.c | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 2c250efca512..aee330047121 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1356,6 +1356,8 @@ static void pdev_assign_fixed_resources(struct pci_dev *dev)
 		while (b && !r->parent) {
 			assign_fixed_resource_on_bus(b, r);
 			b = b->parent;
+			if (!r->parent && pci_movable_bars_enabled())
+				break;
 		}
 	}
 }
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index d8ca40a97693..732d18f60f1b 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -298,6 +298,18 @@ static int _pci_assign_resource(struct pci_dev *dev, int resno,
 
 	bus = dev->bus;
 	while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) {
+		if (pci_movable_bars_enabled()) {
+			if (resno >= PCI_BRIDGE_RESOURCES &&
+			    resno <= PCI_BRIDGE_RESOURCE_END) {
+				struct resource *res = dev->resource + resno;
+
+				res->start = 0;
+				res->end = 0;
+				res->flags = 0;
+			}
+			break;
+		}
+
 		if (!bus->parent || !bus->self->transparent)
 			break;
 		bus = bus->parent;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 10/23] PCI: hotplug: movable BARs: Try to assign unassigned resources only once
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

With enabled BAR movement, BARs and bridge windows can only be assigned to
their direct parents, so there can be only one variant of resource tree,
thus every retry within the pci_assign_unassigned_root_bus_resources() will
result in the same tree, and it is enough to try just once.

In case of failures the pci_reassign_root_bus_resources() disables BARs for
one of the hotplugged devices and tries the assignment again.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index aee330047121..33f709095675 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1819,6 +1819,13 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	int pci_try_num = 1;
 	enum enable_type enable_local;
 
+	if (pci_movable_bars_enabled()) {
+		__pci_bus_size_bridges(bus, NULL);
+		__pci_bus_assign_resources(bus, NULL, NULL);
+
+		goto dump;
+	}
+
 	/* Don't realloc if asked to do so */
 	enable_local = pci_realloc_detect(bus, pci_realloc_enable);
 	if (pci_realloc_enabled(enable_local)) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 10/23] PCI: hotplug: movable BARs: Try to assign unassigned resources only once
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

With enabled BAR movement, BARs and bridge windows can only be assigned to
their direct parents, so there can be only one variant of resource tree,
thus every retry within the pci_assign_unassigned_root_bus_resources() will
result in the same tree, and it is enough to try just once.

In case of failures the pci_reassign_root_bus_resources() disables BARs for
one of the hotplugged devices and tries the assignment again.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index aee330047121..33f709095675 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1819,6 +1819,13 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	int pci_try_num = 1;
 	enum enable_type enable_local;
 
+	if (pci_movable_bars_enabled()) {
+		__pci_bus_size_bridges(bus, NULL);
+		__pci_bus_assign_resources(bus, NULL, NULL);
+
+		goto dump;
+	}
+
 	/* Don't realloc if asked to do so */
 	enable_local = pci_realloc_detect(bus, pci_realloc_enable);
 	if (pci_realloc_enabled(enable_local)) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 11/23] PCI: hotplug: movable BARs: Calculate immovable parts of bridge windows
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

When movable BARs are enabled, and if a bridge contains a device with fixed
(IORESOURCE_PCI_FIXED) or immovable BARs, the corresponing windows can't be
moved too far away from their original positions - they must still contain
all the fixed/immovable BARs, like that:

  1) Window position before a bus rescan:

  | <--                    root bridge window                        --> |
  |                                                                      |
  | | <--     bridge window    --> |                                     |
  | | movable BARs | **fixed BAR** |                                     |

  2) Possible valid outcome after rescan and move:

  | <--                    root bridge window                        --> |
  |                                                                      |
  |                | <--     bridge window    --> |                      |
  |                | **fixed BAR** | Movable BARs |                      |

An immovable area of a bridge (separare for IO, MEM and MEM64 window types)
is a range that covers all the fixed and immovable BARs of direct children,
and all the fixed area of children bridges:

  | <--                    root bridge window                        --> |
  |                                                                      |
  |  | <--                  bridge window level 1                --> |   |
  |  | ******** immovable area of this bridge window ********        |   |
  |  |                                                               |   |
  |  | **fixed BAR**  | <--      bridge window level 2    --> | BARs |   |
  |  |                | ***** fixed area of this bridge ***** |      |   |
  |  |                |                                       |      |   |
  |  |                | ***fixed BAR*** |   | ***fixed BAR*** |      |   |

To store these areas, the .immovable_range field has been added to struct
pci_bus. It is filled recursively from leaves to the root before a rescan.

Also make pbus_size_io() and pbus_size_mem() return their usual result OR
the size of an immovable range of according type, depending on which one is
larger.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       | 14 +++++++
 drivers/pci/probe.c     | 88 +++++++++++++++++++++++++++++++++++++++++
 drivers/pci/setup-bus.c | 17 ++++++++
 include/linux/pci.h     |  6 +++
 4 files changed, 125 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 53249cbc21b6..12add575faf1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -371,6 +371,20 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 	return dev->error_state == pci_channel_io_perm_failure;
 }
 
+static inline int pci_get_bridge_resource_idx(struct resource *r)
+{
+	int idx = 1;
+
+	if (r->flags & IORESOURCE_IO)
+		idx = 0;
+	else if (!(r->flags & IORESOURCE_PREFETCH))
+		idx = 1;
+	else if (r->flags & IORESOURCE_MEM_64)
+		idx = 2;
+
+	return idx;
+}
+
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
 #define PCI_DEV_DISABLED_BARS 1
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bf0a7d1c5d09..5f52a19738aa 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -550,6 +550,7 @@ void pci_read_bridge_bases(struct pci_bus *child)
 static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 {
 	struct pci_bus *b;
+	int idx;
 
 	b = kzalloc(sizeof(*b), GFP_KERNEL);
 	if (!b)
@@ -566,6 +567,11 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 	if (parent)
 		b->domain_nr = parent->domain_nr;
 #endif
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		b->immovable_range[idx].start = 0;
+		b->immovable_range[idx].end = 0;
+	}
+
 	return b;
 }
 
@@ -3512,6 +3518,87 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static void pci_bus_update_immovable_range(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int idx;
+	resource_size_t start, end;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		bus->immovable_range[idx].start = 0;
+		bus->immovable_range[idx].end = 0;
+	}
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_immovable_range(dev->subordinate);
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+		bool dev_is_movable = pci_dev_movable_bars_supported(dev);
+		struct pci_bus *child = dev->subordinate;
+
+		for (i = 0; i < PCI_BRIDGE_RESOURCES; ++i) {
+			struct resource *r = &dev->resource[i];
+
+			if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+				continue;
+
+			if (!dev_is_movable || (r->flags & IORESOURCE_PCI_FIXED)) {
+				idx = pci_get_bridge_resource_idx(r);
+				start = bus->immovable_range[idx].start;
+				end = bus->immovable_range[idx].end;
+
+				if (!start || start > r->start)
+					start = r->start;
+				if (end < r->end)
+					end = r->end;
+
+				if (bus->immovable_range[idx].start != start ||
+				    bus->immovable_range[idx].end != end) {
+					dev_dbg(&bus->dev, "Found fixed 0x%llx-0x%llx in %s, expand the fixed bridge window %d to 0x%llx-0x%llx\n",
+						(unsigned long long)r->start,
+						(unsigned long long)r->end,
+						dev_name(&dev->dev), idx,
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->immovable_range[idx].start = start;
+					bus->immovable_range[idx].end = end;
+				}
+			}
+		}
+
+		if (child) {
+			for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+				struct resource *child_immovable_range =
+					&child->immovable_range[idx];
+
+				if (child_immovable_range->start >=
+				    child_immovable_range->end)
+					continue;
+
+				start = bus->immovable_range[idx].start;
+				end = bus->immovable_range[idx].end;
+
+				if (!start || start > child_immovable_range->start)
+					start = child_immovable_range->start;
+				if (end < child_immovable_range->end)
+					end = child_immovable_range->end;
+
+				if (start < bus->immovable_range[idx].start ||
+				    end > bus->immovable_range[idx].end) {
+					dev_dbg(&bus->dev, "Expand the fixed bridge window %d from %s to 0x%llx-0x%llx\n",
+						idx, dev_name(&child->dev),
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->immovable_range[idx].start = start;
+					bus->immovable_range[idx].end = end;
+				}
+			}
+		}
+	}
+}
+
 static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3610,6 +3697,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+		pci_bus_update_immovable_range(root);
 
 		pci_reassign_root_bus_resources(root);
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 33f709095675..420510a1a257 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -882,9 +882,17 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	resource_size_t children_add_size = 0;
 	resource_size_t min_align, align;
 
+	resource_size_t fixed_start = bus->immovable_range[0].start;
+	resource_size_t fixed_end = bus->immovable_range[0].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
 	if (!b_res)
 		return;
 
+	if (min_size < fixed_size)
+		min_size = fixed_size;
+
 	min_align = window_alignment(bus, IORESOURCE_IO);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
@@ -993,6 +1001,15 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
 	resource_size_t add_align = 0;
+	bool is_mem64 = (mask & IORESOURCE_MEM_64);
+
+	resource_size_t fixed_start = bus->immovable_range[is_mem64 ? 2 : 1].start;
+	resource_size_t fixed_end = bus->immovable_range[is_mem64 ? 2 : 1].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
+	if (min_size < fixed_size)
+		min_size = fixed_size;
 
 	if (!b_res)
 		return -ENOSPC;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 95a8113c2157..efafbf816fe6 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -581,6 +581,12 @@ struct pci_bus {
 	struct list_head resources;	/* Address space routed to this bus */
 	struct resource busn_res;	/* Bus numbers routed to this bus */
 
+	/*
+	 * If there are fixed or immovable resources in the bridge window, this range
+	 * contains the lowest start address and highest end address of them.
+	 */
+	struct resource immovable_range[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 11/23] PCI: hotplug: movable BARs: Calculate immovable parts of bridge windows
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

When movable BARs are enabled, and if a bridge contains a device with fixed
(IORESOURCE_PCI_FIXED) or immovable BARs, the corresponing windows can't be
moved too far away from their original positions - they must still contain
all the fixed/immovable BARs, like that:

  1) Window position before a bus rescan:

  | <--                    root bridge window                        --> |
  |                                                                      |
  | | <--     bridge window    --> |                                     |
  | | movable BARs | **fixed BAR** |                                     |

  2) Possible valid outcome after rescan and move:

  | <--                    root bridge window                        --> |
  |                                                                      |
  |                | <--     bridge window    --> |                      |
  |                | **fixed BAR** | Movable BARs |                      |

An immovable area of a bridge (separare for IO, MEM and MEM64 window types)
is a range that covers all the fixed and immovable BARs of direct children,
and all the fixed area of children bridges:

  | <--                    root bridge window                        --> |
  |                                                                      |
  |  | <--                  bridge window level 1                --> |   |
  |  | ******** immovable area of this bridge window ********        |   |
  |  |                                                               |   |
  |  | **fixed BAR**  | <--      bridge window level 2    --> | BARs |   |
  |  |                | ***** fixed area of this bridge ***** |      |   |
  |  |                |                                       |      |   |
  |  |                | ***fixed BAR*** |   | ***fixed BAR*** |      |   |

To store these areas, the .immovable_range field has been added to struct
pci_bus. It is filled recursively from leaves to the root before a rescan.

Also make pbus_size_io() and pbus_size_mem() return their usual result OR
the size of an immovable range of according type, depending on which one is
larger.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       | 14 +++++++
 drivers/pci/probe.c     | 88 +++++++++++++++++++++++++++++++++++++++++
 drivers/pci/setup-bus.c | 17 ++++++++
 include/linux/pci.h     |  6 +++
 4 files changed, 125 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 53249cbc21b6..12add575faf1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -371,6 +371,20 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 	return dev->error_state == pci_channel_io_perm_failure;
 }
 
+static inline int pci_get_bridge_resource_idx(struct resource *r)
+{
+	int idx = 1;
+
+	if (r->flags & IORESOURCE_IO)
+		idx = 0;
+	else if (!(r->flags & IORESOURCE_PREFETCH))
+		idx = 1;
+	else if (r->flags & IORESOURCE_MEM_64)
+		idx = 2;
+
+	return idx;
+}
+
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
 #define PCI_DEV_DISABLED_BARS 1
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bf0a7d1c5d09..5f52a19738aa 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -550,6 +550,7 @@ void pci_read_bridge_bases(struct pci_bus *child)
 static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 {
 	struct pci_bus *b;
+	int idx;
 
 	b = kzalloc(sizeof(*b), GFP_KERNEL);
 	if (!b)
@@ -566,6 +567,11 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 	if (parent)
 		b->domain_nr = parent->domain_nr;
 #endif
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		b->immovable_range[idx].start = 0;
+		b->immovable_range[idx].end = 0;
+	}
+
 	return b;
 }
 
@@ -3512,6 +3518,87 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static void pci_bus_update_immovable_range(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int idx;
+	resource_size_t start, end;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		bus->immovable_range[idx].start = 0;
+		bus->immovable_range[idx].end = 0;
+	}
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_immovable_range(dev->subordinate);
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+		bool dev_is_movable = pci_dev_movable_bars_supported(dev);
+		struct pci_bus *child = dev->subordinate;
+
+		for (i = 0; i < PCI_BRIDGE_RESOURCES; ++i) {
+			struct resource *r = &dev->resource[i];
+
+			if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+				continue;
+
+			if (!dev_is_movable || (r->flags & IORESOURCE_PCI_FIXED)) {
+				idx = pci_get_bridge_resource_idx(r);
+				start = bus->immovable_range[idx].start;
+				end = bus->immovable_range[idx].end;
+
+				if (!start || start > r->start)
+					start = r->start;
+				if (end < r->end)
+					end = r->end;
+
+				if (bus->immovable_range[idx].start != start ||
+				    bus->immovable_range[idx].end != end) {
+					dev_dbg(&bus->dev, "Found fixed 0x%llx-0x%llx in %s, expand the fixed bridge window %d to 0x%llx-0x%llx\n",
+						(unsigned long long)r->start,
+						(unsigned long long)r->end,
+						dev_name(&dev->dev), idx,
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->immovable_range[idx].start = start;
+					bus->immovable_range[idx].end = end;
+				}
+			}
+		}
+
+		if (child) {
+			for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+				struct resource *child_immovable_range =
+					&child->immovable_range[idx];
+
+				if (child_immovable_range->start >=
+				    child_immovable_range->end)
+					continue;
+
+				start = bus->immovable_range[idx].start;
+				end = bus->immovable_range[idx].end;
+
+				if (!start || start > child_immovable_range->start)
+					start = child_immovable_range->start;
+				if (end < child_immovable_range->end)
+					end = child_immovable_range->end;
+
+				if (start < bus->immovable_range[idx].start ||
+				    end > bus->immovable_range[idx].end) {
+					dev_dbg(&bus->dev, "Expand the fixed bridge window %d from %s to 0x%llx-0x%llx\n",
+						idx, dev_name(&child->dev),
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->immovable_range[idx].start = start;
+					bus->immovable_range[idx].end = end;
+				}
+			}
+		}
+	}
+}
+
 static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3610,6 +3697,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+		pci_bus_update_immovable_range(root);
 
 		pci_reassign_root_bus_resources(root);
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 33f709095675..420510a1a257 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -882,9 +882,17 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	resource_size_t children_add_size = 0;
 	resource_size_t min_align, align;
 
+	resource_size_t fixed_start = bus->immovable_range[0].start;
+	resource_size_t fixed_end = bus->immovable_range[0].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
 	if (!b_res)
 		return;
 
+	if (min_size < fixed_size)
+		min_size = fixed_size;
+
 	min_align = window_alignment(bus, IORESOURCE_IO);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
@@ -993,6 +1001,15 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
 	resource_size_t add_align = 0;
+	bool is_mem64 = (mask & IORESOURCE_MEM_64);
+
+	resource_size_t fixed_start = bus->immovable_range[is_mem64 ? 2 : 1].start;
+	resource_size_t fixed_end = bus->immovable_range[is_mem64 ? 2 : 1].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
+	if (min_size < fixed_size)
+		min_size = fixed_size;
 
 	if (!b_res)
 		return -ENOSPC;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 95a8113c2157..efafbf816fe6 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -581,6 +581,12 @@ struct pci_bus {
 	struct list_head resources;	/* Address space routed to this bus */
 	struct resource busn_res;	/* Bus numbers routed to this bus */
 
+	/*
+	 * If there are fixed or immovable resources in the bridge window, this range
+	 * contains the lowest start address and highest end address of them.
+	 */
+	struct resource immovable_range[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 12/23] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

With enabled movable BARs, bridge windows are recalculated during each pci
rescan. Some of the BARs below the bridge may be fixed/immovable: these
areas are represented by the .immovable_range field in struct pci_bus.

If a bridge window size is equal to its immovable range, it can only be
assigned to the start of this range. But if a bridge window size is larger,
and this difference in size is denoted as "delta", the window can start
from (immovable_range.start - delta) to (immovable_range.start), and it can
end from (immovable_range.end) to (immovable_range.end + delta). This range
(the new .realloc_range field in struct pci_bus) must then be compared with
immovable ranges of neighbouring bridges to guarantee no intersections.

This patch only calculates valid ranges for reallocated bridges during pci
rescan, and the next one will make use of these values during allocation.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 67 +++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h     |  6 ++++
 2 files changed, 73 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 420510a1a257..586aaa9578b2 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1819,6 +1819,72 @@ static enum enable_type pci_realloc_detect(struct pci_bus *bus,
 }
 #endif
 
+/*
+ * Calculate the address margins where the bridge windows may be allocated to fit all
+ * the fixed and immovable BARs beneath.
+ */
+static void pci_bus_update_realloc_range(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	struct pci_bus *parent = bus->parent;
+	int idx;
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_realloc_range(dev->subordinate);
+
+	if (!parent || !bus->self)
+		return;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		struct resource *immovable_range = &bus->immovable_range[idx];
+		resource_size_t window_size = resource_size(bus->resource[idx]);
+		resource_size_t realloc_start, realloc_end;
+
+		bus->realloc_range[idx].start = 0;
+		bus->realloc_range[idx].end = 0;
+
+		/* Check if there any immovable BARs under the bridge */
+		if (immovable_range->start >= immovable_range->end)
+			continue;
+
+		/* The lowest possible address where the bridge window can start */
+		realloc_start = immovable_range->end - window_size + 1;
+		/* The highest possible address where the bridge window can end */
+		realloc_end = immovable_range->start + window_size - 1;
+
+		if (realloc_start > immovable_range->start)
+			realloc_start = immovable_range->start;
+
+		if (realloc_end < immovable_range->end)
+			realloc_end = immovable_range->end;
+
+		/*
+		 * Check that realloc range doesn't intersect with hard fixed ranges
+		 * of neighboring bridges
+		 */
+		list_for_each_entry(dev, &parent->devices, bus_list) {
+			struct pci_bus *neighbor = dev->subordinate;
+			struct resource *n_imm_range;
+
+			if (!neighbor || neighbor == bus)
+				continue;
+
+			n_imm_range = &neighbor->immovable_range[idx];
+
+			if (n_imm_range->start >= n_imm_range->end)
+				continue;
+
+			if (n_imm_range->end < immovable_range->start &&
+			    n_imm_range->end > realloc_start)
+				realloc_start = n_imm_range->end;
+		}
+
+		bus->realloc_range[idx].start = realloc_start;
+		bus->realloc_range[idx].end = realloc_end;
+	}
+}
+
 /*
  * First try will not touch PCI bridge res.
  * Second and later try will clear small leaf bridge res.
@@ -1838,6 +1904,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 
 	if (pci_movable_bars_enabled()) {
 		__pci_bus_size_bridges(bus, NULL);
+		pci_bus_update_realloc_range(bus);
 		__pci_bus_assign_resources(bus, NULL, NULL);
 
 		goto dump;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index efafbf816fe6..bf6638cf2525 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -587,6 +587,12 @@ struct pci_bus {
 	 */
 	struct resource immovable_range[PCI_BRIDGE_RESOURCE_NUM];
 
+	/*
+	 * Acceptable address range, where the bridge window may reside, considering its
+	 * size, so it will cover all the fixed and immovable BARs below.
+	 */
+	struct resource realloc_range[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 12/23] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

With enabled movable BARs, bridge windows are recalculated during each pci
rescan. Some of the BARs below the bridge may be fixed/immovable: these
areas are represented by the .immovable_range field in struct pci_bus.

If a bridge window size is equal to its immovable range, it can only be
assigned to the start of this range. But if a bridge window size is larger,
and this difference in size is denoted as "delta", the window can start
from (immovable_range.start - delta) to (immovable_range.start), and it can
end from (immovable_range.end) to (immovable_range.end + delta). This range
(the new .realloc_range field in struct pci_bus) must then be compared with
immovable ranges of neighbouring bridges to guarantee no intersections.

This patch only calculates valid ranges for reallocated bridges during pci
rescan, and the next one will make use of these values during allocation.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 67 +++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h     |  6 ++++
 2 files changed, 73 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 420510a1a257..586aaa9578b2 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1819,6 +1819,72 @@ static enum enable_type pci_realloc_detect(struct pci_bus *bus,
 }
 #endif
 
+/*
+ * Calculate the address margins where the bridge windows may be allocated to fit all
+ * the fixed and immovable BARs beneath.
+ */
+static void pci_bus_update_realloc_range(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	struct pci_bus *parent = bus->parent;
+	int idx;
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_realloc_range(dev->subordinate);
+
+	if (!parent || !bus->self)
+		return;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		struct resource *immovable_range = &bus->immovable_range[idx];
+		resource_size_t window_size = resource_size(bus->resource[idx]);
+		resource_size_t realloc_start, realloc_end;
+
+		bus->realloc_range[idx].start = 0;
+		bus->realloc_range[idx].end = 0;
+
+		/* Check if there any immovable BARs under the bridge */
+		if (immovable_range->start >= immovable_range->end)
+			continue;
+
+		/* The lowest possible address where the bridge window can start */
+		realloc_start = immovable_range->end - window_size + 1;
+		/* The highest possible address where the bridge window can end */
+		realloc_end = immovable_range->start + window_size - 1;
+
+		if (realloc_start > immovable_range->start)
+			realloc_start = immovable_range->start;
+
+		if (realloc_end < immovable_range->end)
+			realloc_end = immovable_range->end;
+
+		/*
+		 * Check that realloc range doesn't intersect with hard fixed ranges
+		 * of neighboring bridges
+		 */
+		list_for_each_entry(dev, &parent->devices, bus_list) {
+			struct pci_bus *neighbor = dev->subordinate;
+			struct resource *n_imm_range;
+
+			if (!neighbor || neighbor == bus)
+				continue;
+
+			n_imm_range = &neighbor->immovable_range[idx];
+
+			if (n_imm_range->start >= n_imm_range->end)
+				continue;
+
+			if (n_imm_range->end < immovable_range->start &&
+			    n_imm_range->end > realloc_start)
+				realloc_start = n_imm_range->end;
+		}
+
+		bus->realloc_range[idx].start = realloc_start;
+		bus->realloc_range[idx].end = realloc_end;
+	}
+}
+
 /*
  * First try will not touch PCI bridge res.
  * Second and later try will clear small leaf bridge res.
@@ -1838,6 +1904,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 
 	if (pci_movable_bars_enabled()) {
 		__pci_bus_size_bridges(bus, NULL);
+		pci_bus_update_realloc_range(bus);
 		__pci_bus_assign_resources(bus, NULL, NULL);
 
 		goto dump;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index efafbf816fe6..bf6638cf2525 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -587,6 +587,12 @@ struct pci_bus {
 	 */
 	struct resource immovable_range[PCI_BRIDGE_RESOURCE_NUM];
 
+	/*
+	 * Acceptable address range, where the bridge window may reside, considering its
+	 * size, so it will cover all the fixed and immovable BARs below.
+	 */
+	struct resource realloc_range[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 13/23] PCI: Make sure bridge windows include their fixed BARs
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

When the time comes to select a start address for the bridge window during
the root bus rescan, it should be not just a lowest possible address: this
window must cover all the underlying fixed and immovable BARs. The lowest
address that satisfies this requirement is the .realloc_range field of
struct pci_bus, which is calculated during the preparation to the rescan.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/bus.c       |  2 +-
 drivers/pci/setup-res.c | 28 ++++++++++++++++++++++++++--
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 495059d923f7..7aae830751e9 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -192,7 +192,7 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, struct resource *res,
 		 * this is an already-configured bridge window, its start
 		 * overrides "min".
 		 */
-		if (avail.start)
+		if (min_used < avail.start)
 			min_used = avail.start;
 
 		max = avail.end;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 732d18f60f1b..7357bcc12a53 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -248,9 +248,20 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	struct resource *res = dev->resource + resno;
 	resource_size_t min;
 	int ret;
+	resource_size_t start = (resource_size_t)-1;
+	resource_size_t end = 0;
 
 	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
+	if (dev->subordinate && resno >= PCI_BRIDGE_RESOURCES) {
+		struct pci_bus *child_bus = dev->subordinate;
+		int b_resno = resno - PCI_BRIDGE_RESOURCES;
+		struct resource *immovable_range = &child_bus->immovable_range[b_resno];
+
+		if (immovable_range->start < immovable_range->end)
+			min = child_bus->realloc_range[b_resno].start;
+	}
+
 	/*
 	 * First, try exact prefetching match.  Even if a 64-bit
 	 * prefetchable bridge window is below 4GB, we can't put a 32-bit
@@ -262,7 +273,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 				     IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
 				     pcibios_align_resource, dev);
 	if (ret == 0)
-		return 0;
+		goto check_fixed;
 
 	/*
 	 * If the prefetchable window is only 32 bits wide, we can put
@@ -274,7 +285,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 					     IORESOURCE_PREFETCH,
 					     pcibios_align_resource, dev);
 		if (ret == 0)
-			return 0;
+			goto check_fixed;
 	}
 
 	/*
@@ -287,6 +298,19 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 		ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
 					     pcibios_align_resource, dev);
 
+check_fixed:
+	if (ret == 0 && start < end) {
+		if (res->start > start || res->end < end) {
+			dev_err(&bus->dev, "fixed area 0x%llx-0x%llx for %s doesn't fit in the allocated %pR (0x%llx-0x%llx)",
+				(unsigned long long)start, (unsigned long long)end,
+				dev_name(&dev->dev),
+				res, (unsigned long long)res->start,
+				(unsigned long long)res->end);
+			release_resource(res);
+			return -1;
+		}
+	}
+
 	return ret;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 13/23] PCI: Make sure bridge windows include their fixed BARs
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

When the time comes to select a start address for the bridge window during
the root bus rescan, it should be not just a lowest possible address: this
window must cover all the underlying fixed and immovable BARs. The lowest
address that satisfies this requirement is the .realloc_range field of
struct pci_bus, which is calculated during the preparation to the rescan.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/bus.c       |  2 +-
 drivers/pci/setup-res.c | 28 ++++++++++++++++++++++++++--
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 495059d923f7..7aae830751e9 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -192,7 +192,7 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, struct resource *res,
 		 * this is an already-configured bridge window, its start
 		 * overrides "min".
 		 */
-		if (avail.start)
+		if (min_used < avail.start)
 			min_used = avail.start;
 
 		max = avail.end;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 732d18f60f1b..7357bcc12a53 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -248,9 +248,20 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	struct resource *res = dev->resource + resno;
 	resource_size_t min;
 	int ret;
+	resource_size_t start = (resource_size_t)-1;
+	resource_size_t end = 0;
 
 	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
+	if (dev->subordinate && resno >= PCI_BRIDGE_RESOURCES) {
+		struct pci_bus *child_bus = dev->subordinate;
+		int b_resno = resno - PCI_BRIDGE_RESOURCES;
+		struct resource *immovable_range = &child_bus->immovable_range[b_resno];
+
+		if (immovable_range->start < immovable_range->end)
+			min = child_bus->realloc_range[b_resno].start;
+	}
+
 	/*
 	 * First, try exact prefetching match.  Even if a 64-bit
 	 * prefetchable bridge window is below 4GB, we can't put a 32-bit
@@ -262,7 +273,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 				     IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
 				     pcibios_align_resource, dev);
 	if (ret == 0)
-		return 0;
+		goto check_fixed;
 
 	/*
 	 * If the prefetchable window is only 32 bits wide, we can put
@@ -274,7 +285,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 					     IORESOURCE_PREFETCH,
 					     pcibios_align_resource, dev);
 		if (ret == 0)
-			return 0;
+			goto check_fixed;
 	}
 
 	/*
@@ -287,6 +298,19 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 		ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
 					     pcibios_align_resource, dev);
 
+check_fixed:
+	if (ret == 0 && start < end) {
+		if (res->start > start || res->end < end) {
+			dev_err(&bus->dev, "fixed area 0x%llx-0x%llx for %s doesn't fit in the allocated %pR (0x%llx-0x%llx)",
+				(unsigned long long)start, (unsigned long long)end,
+				dev_name(&dev->dev),
+				res, (unsigned long long)res->start,
+				(unsigned long long)res->end);
+			release_resource(res);
+			return -1;
+		}
+	}
+
 	return ret;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 14/23] PCI: Fix assigning the fixed prefetchable resources
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Allow matching IORESOURCE_PCI_FIXED prefetchable BARs to non-prefetchable
windows, so they follow the same rules as immovable BARs.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 586aaa9578b2..6f12411357f3 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1340,15 +1340,20 @@ static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 {
 	int i;
 	struct resource *parent_r;
-	unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
-			     IORESOURCE_PREFETCH;
+	unsigned long mask = IORESOURCE_TYPE_BITS;
 
 	pci_bus_for_each_resource(b, parent_r, i) {
 		if (!parent_r)
 			continue;
 
-		if ((r->flags & mask) == (parent_r->flags & mask) &&
-		    resource_contains(parent_r, r))
+		if ((r->flags & mask) != (parent_r->flags & mask))
+			continue;
+
+		if (parent_r->flags & IORESOURCE_PREFETCH &&
+		    !(r->flags & IORESOURCE_PREFETCH))
+			continue;
+
+		if (resource_contains(parent_r, r))
 			request_resource(parent_r, r);
 	}
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 14/23] PCI: Fix assigning the fixed prefetchable resources
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Allow matching IORESOURCE_PCI_FIXED prefetchable BARs to non-prefetchable
windows, so they follow the same rules as immovable BARs.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 586aaa9578b2..6f12411357f3 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1340,15 +1340,20 @@ static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 {
 	int i;
 	struct resource *parent_r;
-	unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
-			     IORESOURCE_PREFETCH;
+	unsigned long mask = IORESOURCE_TYPE_BITS;
 
 	pci_bus_for_each_resource(b, parent_r, i) {
 		if (!parent_r)
 			continue;
 
-		if ((r->flags & mask) == (parent_r->flags & mask) &&
-		    resource_contains(parent_r, r))
+		if ((r->flags & mask) != (parent_r->flags & mask))
+			continue;
+
+		if (parent_r->flags & IORESOURCE_PREFETCH &&
+		    !(r->flags & IORESOURCE_PREFETCH))
+			continue;
+
+		if (resource_contains(parent_r, r))
 			request_resource(parent_r, r);
 	}
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 15/23] PCI: hotplug: movable BARs: Assign fixed and immovable BARs before others
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Reassign resources during rescan in two steps: first the fixed/immovable
BARs and bridge windows that have fixed areas, so the movable ones will not
steal these reserved areas; then the rest - so the movable BARs will divide
the rest of the space.

With this change, pci_assign_resource() is now able to assign all types of
BARs, so the pdev_assign_fixed_resources() became unused and thus removed.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  2 ++
 drivers/pci/setup-bus.c | 79 ++++++++++++++++++++++++-----------------
 drivers/pci/setup-res.c |  8 +++--
 3 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 12add575faf1..e1fcc46f9c40 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -260,6 +260,8 @@ void pci_disable_bridge_window(struct pci_dev *dev);
 
 bool pci_dev_movable_bars_supported(struct pci_dev *dev);
 
+int assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r);
+
 /* PCIe link information */
 #define PCIE_SPEED2STR(speed) \
 	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6f12411357f3..c7b7e30c6284 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -38,6 +38,15 @@ struct pci_dev_resource {
 	unsigned long flags;
 };
 
+enum assign_step {
+	assign_fixed_resources,
+	assign_float_resources,
+};
+
+static void _assign_requested_resources_sorted(struct list_head *head,
+					       struct list_head *fail_head,
+					       enum assign_step step);
+
 static void free_list(struct list_head *head)
 {
 	struct pci_dev_resource *dev_res, *tmp;
@@ -278,19 +287,48 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
  */
 static void assign_requested_resources_sorted(struct list_head *head,
 				 struct list_head *fail_head)
+{
+	_assign_requested_resources_sorted(head, fail_head, assign_fixed_resources);
+	_assign_requested_resources_sorted(head, fail_head, assign_float_resources);
+}
+
+static void _assign_requested_resources_sorted(struct list_head *head,
+					       struct list_head *fail_head,
+					       enum assign_step step)
 {
 	struct resource *res;
 	struct pci_dev_resource *dev_res;
 	int idx;
 
 	list_for_each_entry(dev_res, head, list) {
+		bool is_fixed = false;
+
 		if (!pci_dev_bars_enabled(dev_res->dev))
 			continue;
 
 		res = dev_res->res;
+		if (!resource_size(res))
+			continue;
+
 		idx = res - &dev_res->dev->resource[0];
-		if (resource_size(res) &&
-		    pci_assign_resource(dev_res->dev, idx)) {
+
+		if (idx < PCI_BRIDGE_RESOURCES) {
+			is_fixed = (res->flags & IORESOURCE_PCI_FIXED) ||
+				!pci_dev_movable_bars_supported(dev_res->dev);
+		} else {
+			int b_res_idx = pci_get_bridge_resource_idx(res);
+			struct resource *fixed_res =
+				&dev_res->dev->subordinate->immovable_range[b_res_idx];
+
+			is_fixed = (fixed_res->start < fixed_res->end);
+		}
+
+		if (assign_fixed_resources == step && !is_fixed)
+			continue;
+		else if (assign_float_resources == step && is_fixed)
+			continue;
+
+		if (pci_assign_resource(dev_res->dev, idx)) {
 			if (fail_head) {
 				/*
 				 * If the failed resource is a ROM BAR and
@@ -1336,7 +1374,7 @@ void pci_bus_size_bridges(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_size_bridges);
 
-static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
+int assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 {
 	int i;
 	struct resource *parent_r;
@@ -1353,35 +1391,14 @@ static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 		    !(r->flags & IORESOURCE_PREFETCH))
 			continue;
 
-		if (resource_contains(parent_r, r))
-			request_resource(parent_r, r);
-	}
-}
-
-/*
- * Try to assign any resources marked as IORESOURCE_PCI_FIXED, as they are
- * skipped by pbus_assign_resources_sorted().
- */
-static void pdev_assign_fixed_resources(struct pci_dev *dev)
-{
-	int i;
-
-	for (i = 0; i <  PCI_NUM_RESOURCES; i++) {
-		struct pci_bus *b;
-		struct resource *r = &dev->resource[i];
-
-		if (r->parent || !(r->flags & IORESOURCE_PCI_FIXED) ||
-		    !(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
-			continue;
-
-		b = dev->bus;
-		while (b && !r->parent) {
-			assign_fixed_resource_on_bus(b, r);
-			b = b->parent;
-			if (!r->parent && pci_movable_bars_enabled())
-				break;
+		if (resource_contains(parent_r, r)) {
+			if (!request_resource(parent_r, r))
+				return 0;
 		}
 	}
+
+	dev_err(&b->dev, "failed to assign immovable %pR\n", r);
+	return -EBUSY;
 }
 
 void __pci_bus_assign_resources(const struct pci_bus *bus,
@@ -1397,8 +1414,6 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 		if (!pci_dev_bars_enabled(dev))
 			continue;
 
-		pdev_assign_fixed_resources(dev);
-
 		b = dev->subordinate;
 		if (!b)
 			continue;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 7357bcc12a53..02620cfac0c7 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -348,8 +348,12 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
 	resource_size_t align, size;
 	int ret;
 
-	if (res->flags & IORESOURCE_PCI_FIXED)
-		return 0;
+	if ((res->flags & IORESOURCE_PCI_FIXED) ||
+	    (resno < PCI_BRIDGE_RESOURCES &&
+	     !pci_dev_movable_bars_supported(dev) &&
+	     res->start)) {
+		return assign_fixed_resource_on_bus(dev->bus, res);
+	}
 
 	res->flags |= IORESOURCE_UNSET;
 	align = pci_resource_alignment(dev, res);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 15/23] PCI: hotplug: movable BARs: Assign fixed and immovable BARs before others
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Reassign resources during rescan in two steps: first the fixed/immovable
BARs and bridge windows that have fixed areas, so the movable ones will not
steal these reserved areas; then the rest - so the movable BARs will divide
the rest of the space.

With this change, pci_assign_resource() is now able to assign all types of
BARs, so the pdev_assign_fixed_resources() became unused and thus removed.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  2 ++
 drivers/pci/setup-bus.c | 79 ++++++++++++++++++++++++-----------------
 drivers/pci/setup-res.c |  8 +++--
 3 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 12add575faf1..e1fcc46f9c40 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -260,6 +260,8 @@ void pci_disable_bridge_window(struct pci_dev *dev);
 
 bool pci_dev_movable_bars_supported(struct pci_dev *dev);
 
+int assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r);
+
 /* PCIe link information */
 #define PCIE_SPEED2STR(speed) \
 	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6f12411357f3..c7b7e30c6284 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -38,6 +38,15 @@ struct pci_dev_resource {
 	unsigned long flags;
 };
 
+enum assign_step {
+	assign_fixed_resources,
+	assign_float_resources,
+};
+
+static void _assign_requested_resources_sorted(struct list_head *head,
+					       struct list_head *fail_head,
+					       enum assign_step step);
+
 static void free_list(struct list_head *head)
 {
 	struct pci_dev_resource *dev_res, *tmp;
@@ -278,19 +287,48 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
  */
 static void assign_requested_resources_sorted(struct list_head *head,
 				 struct list_head *fail_head)
+{
+	_assign_requested_resources_sorted(head, fail_head, assign_fixed_resources);
+	_assign_requested_resources_sorted(head, fail_head, assign_float_resources);
+}
+
+static void _assign_requested_resources_sorted(struct list_head *head,
+					       struct list_head *fail_head,
+					       enum assign_step step)
 {
 	struct resource *res;
 	struct pci_dev_resource *dev_res;
 	int idx;
 
 	list_for_each_entry(dev_res, head, list) {
+		bool is_fixed = false;
+
 		if (!pci_dev_bars_enabled(dev_res->dev))
 			continue;
 
 		res = dev_res->res;
+		if (!resource_size(res))
+			continue;
+
 		idx = res - &dev_res->dev->resource[0];
-		if (resource_size(res) &&
-		    pci_assign_resource(dev_res->dev, idx)) {
+
+		if (idx < PCI_BRIDGE_RESOURCES) {
+			is_fixed = (res->flags & IORESOURCE_PCI_FIXED) ||
+				!pci_dev_movable_bars_supported(dev_res->dev);
+		} else {
+			int b_res_idx = pci_get_bridge_resource_idx(res);
+			struct resource *fixed_res =
+				&dev_res->dev->subordinate->immovable_range[b_res_idx];
+
+			is_fixed = (fixed_res->start < fixed_res->end);
+		}
+
+		if (assign_fixed_resources == step && !is_fixed)
+			continue;
+		else if (assign_float_resources == step && is_fixed)
+			continue;
+
+		if (pci_assign_resource(dev_res->dev, idx)) {
 			if (fail_head) {
 				/*
 				 * If the failed resource is a ROM BAR and
@@ -1336,7 +1374,7 @@ void pci_bus_size_bridges(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_size_bridges);
 
-static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
+int assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 {
 	int i;
 	struct resource *parent_r;
@@ -1353,35 +1391,14 @@ static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 		    !(r->flags & IORESOURCE_PREFETCH))
 			continue;
 
-		if (resource_contains(parent_r, r))
-			request_resource(parent_r, r);
-	}
-}
-
-/*
- * Try to assign any resources marked as IORESOURCE_PCI_FIXED, as they are
- * skipped by pbus_assign_resources_sorted().
- */
-static void pdev_assign_fixed_resources(struct pci_dev *dev)
-{
-	int i;
-
-	for (i = 0; i <  PCI_NUM_RESOURCES; i++) {
-		struct pci_bus *b;
-		struct resource *r = &dev->resource[i];
-
-		if (r->parent || !(r->flags & IORESOURCE_PCI_FIXED) ||
-		    !(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
-			continue;
-
-		b = dev->bus;
-		while (b && !r->parent) {
-			assign_fixed_resource_on_bus(b, r);
-			b = b->parent;
-			if (!r->parent && pci_movable_bars_enabled())
-				break;
+		if (resource_contains(parent_r, r)) {
+			if (!request_resource(parent_r, r))
+				return 0;
 		}
 	}
+
+	dev_err(&b->dev, "failed to assign immovable %pR\n", r);
+	return -EBUSY;
 }
 
 void __pci_bus_assign_resources(const struct pci_bus *bus,
@@ -1397,8 +1414,6 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 		if (!pci_dev_bars_enabled(dev))
 			continue;
 
-		pdev_assign_fixed_resources(dev);
-
 		b = dev->subordinate;
 		if (!b)
 			continue;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 7357bcc12a53..02620cfac0c7 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -348,8 +348,12 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
 	resource_size_t align, size;
 	int ret;
 
-	if (res->flags & IORESOURCE_PCI_FIXED)
-		return 0;
+	if ((res->flags & IORESOURCE_PCI_FIXED) ||
+	    (resno < PCI_BRIDGE_RESOURCES &&
+	     !pci_dev_movable_bars_supported(dev) &&
+	     res->start)) {
+		return assign_fixed_resource_on_bus(dev->bus, res);
+	}
 
 	res->flags |= IORESOURCE_UNSET;
 	align = pci_resource_alignment(dev, res);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 16/23] PCI: hotplug: movable BARs: Don't reserve IO/mem bus space
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

A hotplugged bridge with many hotplug-capable ports may request
reserving more IO space than the machine has. This could be overridden
with the "hpiosize=" kernel argument though.

But when BARs are movable, there are no need to reserve space anymore:
new BARs are allocated not from reserved gaps, but via rearranging the
existing BARs. Requesting a precise amount of space for bridge windows
increases the chances of adding the new bridge successfully.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c7b7e30c6284..7d64ec8e7088 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1287,7 +1287,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 
 	case PCI_HEADER_TYPE_BRIDGE:
 		pci_bridge_check_ranges(bus);
-		if (bus->self->is_hotplug_bridge) {
+		if (bus->self->is_hotplug_bridge && !pci_movable_bars_enabled()) {
 			additional_io_size  = pci_hotplug_io_size;
 			additional_mem_size = pci_hotplug_mem_size;
 		}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 16/23] PCI: hotplug: movable BARs: Don't reserve IO/mem bus space
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

A hotplugged bridge with many hotplug-capable ports may request
reserving more IO space than the machine has. This could be overridden
with the "hpiosize=" kernel argument though.

But when BARs are movable, there are no need to reserve space anymore:
new BARs are allocated not from reserved gaps, but via rearranging the
existing BARs. Requesting a precise amount of space for bridge windows
increases the chances of adding the new bridge successfully.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c7b7e30c6284..7d64ec8e7088 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1287,7 +1287,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 
 	case PCI_HEADER_TYPE_BRIDGE:
 		pci_bridge_check_ranges(bus);
-		if (bus->self->is_hotplug_bridge) {
+		if (bus->self->is_hotplug_bridge && !pci_movable_bars_enabled()) {
 			additional_io_size  = pci_hotplug_io_size;
 			additional_mem_size = pci_hotplug_mem_size;
 		}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 17/23] powerpc/pci: Fix crash with enabled movable BARs
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko, Alexey Kardashevskiy

Add a check for the UNSET resource flag to skip the released BARs

CC: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d8080558d020..362eac42f463 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2986,7 +2986,8 @@ static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe,
 	int index;
 	int64_t rc;
 
-	if (!res || !res->flags || res->start > res->end)
+	if (!res || !res->flags || res->start > res->end ||
+	    (res->flags & IORESOURCE_UNSET))
 		return;
 
 	if (res->flags & IORESOURCE_IO) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 17/23] powerpc/pci: Fix crash with enabled movable BARs
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Alexey Kardashevskiy, Sergey Miroshnichenko, Bjorn Helgaas, linux

Add a check for the UNSET resource flag to skip the released BARs

CC: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d8080558d020..362eac42f463 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2986,7 +2986,8 @@ static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe,
 	int index;
 	int64_t rc;
 
-	if (!res || !res->flags || res->start > res->end)
+	if (!res || !res->flags || res->start > res->end ||
+	    (res->flags & IORESOURCE_UNSET))
 		return;
 
 	if (res->flags & IORESOURCE_IO) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 18/23] powerpc/pci: Handle BAR movement
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko, Oliver O'Halloran

Add pcibios_rescan_prepare()/_done() hooks for the powerpc platform. Now if
the device's driver supports movable BARs, pcibios_rescan_prepare() will be
called after the device is stopped, and pcibios_rescan_done() - before it
resumes. There are no memory requests to this device between the hooks, so
it it safe to rebuild the EEH address cache during that.

CC: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 arch/powerpc/kernel/pci-hotplug.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 0b0cf8168b47..18cf13bba228 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -144,3 +144,13 @@ void pci_hp_add_devices(struct pci_bus *bus)
 	pcibios_finish_adding_to_bus(bus);
 }
 EXPORT_SYMBOL_GPL(pci_hp_add_devices);
+
+void pcibios_rescan_prepare(struct pci_dev *pdev)
+{
+	eeh_addr_cache_rmv_dev(pdev);
+}
+
+void pcibios_rescan_done(struct pci_dev *pdev)
+{
+	eeh_addr_cache_insert_dev(pdev);
+}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 18/23] powerpc/pci: Handle BAR movement
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Sergey Miroshnichenko, Oliver O'Halloran, Bjorn Helgaas, linux

Add pcibios_rescan_prepare()/_done() hooks for the powerpc platform. Now if
the device's driver supports movable BARs, pcibios_rescan_prepare() will be
called after the device is stopped, and pcibios_rescan_done() - before it
resumes. There are no memory requests to this device between the hooks, so
it it safe to rebuild the EEH address cache during that.

CC: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 arch/powerpc/kernel/pci-hotplug.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 0b0cf8168b47..18cf13bba228 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -144,3 +144,13 @@ void pci_hp_add_devices(struct pci_bus *bus)
 	pcibios_finish_adding_to_bus(bus);
 }
 EXPORT_SYMBOL_GPL(pci_hp_add_devices);
+
+void pcibios_rescan_prepare(struct pci_dev *pdev)
+{
+	eeh_addr_cache_rmv_dev(pdev);
+}
+
+void pcibios_rescan_done(struct pci_dev *pdev)
+{
+	eeh_addr_cache_insert_dev(pdev);
+}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 19/23] PCI: hotplug: Configure MPS for hot-added bridges during bus rescan
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Assure that MPS settings are set up for bridges which are discovered
during manually triggered rescan via sysfs. This sequence of bridge
init (using pci_rescan_bus()) will be used for pciehp hot-add events
when BARs are movable.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5f52a19738aa..4bb10d27cb3a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3688,7 +3688,7 @@ static void pci_reassign_root_bus_resources(struct pci_bus *root)
 unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
-	struct pci_bus *root = bus;
+	struct pci_bus *root = bus, *child;
 
 	while (!pci_is_root_bus(root))
 		root = root->parent;
@@ -3708,6 +3708,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_assign_unassigned_bus_resources(bus);
 	}
 
+	list_for_each_entry(child, &root->children, node)
+		pcie_bus_configure_settings(child);
+
 	pci_bus_add_devices(bus);
 
 	return max;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 19/23] PCI: hotplug: Configure MPS for hot-added bridges during bus rescan
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Assure that MPS settings are set up for bridges which are discovered
during manually triggered rescan via sysfs. This sequence of bridge
init (using pci_rescan_bus()) will be used for pciehp hot-add events
when BARs are movable.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5f52a19738aa..4bb10d27cb3a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3688,7 +3688,7 @@ static void pci_reassign_root_bus_resources(struct pci_bus *root)
 unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
-	struct pci_bus *root = bus;
+	struct pci_bus *root = bus, *child;
 
 	while (!pci_is_root_bus(root))
 		root = root->parent;
@@ -3708,6 +3708,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_assign_unassigned_bus_resources(bus);
 	}
 
+	list_for_each_entry(child, &root->children, node)
+		pcie_bus_configure_settings(child);
+
 	pci_bus_add_devices(bus);
 
 	return max;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 20/23] PCI: hotplug: movable BARs: Enable the feature by default
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

This is the last patch in the series which implements the essentials of the
Movable BARs feature, so it is turned by default now. Tested on:

 - x86_64 with "pci=realloc,assign-busses,use_crs,pcie_bus_peer2peer"
   command line argument;
 - POWER8 PowerNV+PHB3 ppc64le with "pci=realloc,pcie_bus_peer2peer".

In case of problems it is still can be overridden by the following command
line option:

    pcie_movable_bars=off

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci-driver.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d11909e79263..a8124e47bf6e 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1688,8 +1688,6 @@ static int __init pci_driver_init(void)
 {
 	int ret;
 
-	pci_add_flags(PCI_IMMOVABLE_BARS);
-
 	ret = bus_register(&pci_bus_type);
 	if (ret)
 		return ret;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 20/23] PCI: hotplug: movable BARs: Enable the feature by default
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

This is the last patch in the series which implements the essentials of the
Movable BARs feature, so it is turned by default now. Tested on:

 - x86_64 with "pci=realloc,assign-busses,use_crs,pcie_bus_peer2peer"
   command line argument;
 - POWER8 PowerNV+PHB3 ppc64le with "pci=realloc,pcie_bus_peer2peer".

In case of problems it is still can be overridden by the following command
line option:

    pcie_movable_bars=off

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci-driver.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d11909e79263..a8124e47bf6e 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1688,8 +1688,6 @@ static int __init pci_driver_init(void)
 {
 	int ret;
 
-	pci_add_flags(PCI_IMMOVABLE_BARS);
-
 	ret = bus_register(&pci_bus_type);
 	if (ret)
 		return ret;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 21/23] nvme-pci: Handle movable BARs
  2019-08-16 16:50 ` Sergey Miroshnichenko
  (?)
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko, linux-nvme,
	Christoph Hellwig

Hotplugged devices can affect the existing ones by moving their BARs. The
PCI subsystem will inform the NVME driver about this by invoking the
.rescan_prepare() and .rescan_done() hooks, so the BARs can by re-mapped.

Tested under the "randrw" mode of the fio tool. Before the hotplugging:

  % sudo cat /proc/iomem
  ...
                3fe800000000-3fe8007fffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:18
                    3fe800000000-3fe8000fffff : 0020:18:00.0
                      3fe800000000-3fe8000fffff : nvme
                    3fe800100000-3fe80017ffff : 0020:18:00.0
  ...

, then another NVME drive was hot-added, so BARs of the 0020:18:00.0 are
moved:

  % sudo cat /proc/iomem
    ...
                3fe800000000-3fe800ffffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:10
                    3fe800000000-3fe800003fff : 0020:10:00.0
                      3fe800000000-3fe800003fff : nvme
                    3fe800010000-3fe80001ffff : 0020:10:00.0
                  3fe800800000-3fe800ffffff : PCI Bus 0020:18
                    3fe800800000-3fe8008fffff : 0020:18:00.0
                      3fe800800000-3fe8008fffff : nvme
                    3fe800900000-3fe80097ffff : 0020:18:00.0
    ...

During the rescanning, both READ and WRITE speeds drop to zero for a while
due to driver's pause, then restore.

Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/nvme/host/pci.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index db160cee42ad..a805d80082ca 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1645,7 +1645,7 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
-	if (size <= dev->bar_mapped_size)
+	if (dev->bar && size <= dev->bar_mapped_size)
 		return 0;
 	if (size > pci_resource_len(pdev, 0))
 		return -ENOMEM;
@@ -2980,6 +2980,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
 	flush_work(&dev->ctrl.reset_work);
 }
 
+static void nvme_rescan_prepare(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_disable(dev, false);
+	nvme_dev_unmap(dev);
+	dev->bar = NULL;
+}
+
+static void nvme_rescan_done(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_map(dev);
+	nvme_reset_ctrl_sync(&dev->ctrl);
+}
+
 static const struct pci_error_handlers nvme_err_handler = {
 	.error_detected	= nvme_error_detected,
 	.slot_reset	= nvme_slot_reset,
@@ -3049,6 +3066,8 @@ static struct pci_driver nvme_driver = {
 #endif
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
+	.rescan_prepare	= nvme_rescan_prepare,
+	.rescan_done	= nvme_rescan_done,
 };
 
 static int __init nvme_init(void)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 21/23] nvme-pci: Handle movable BARs
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)


Hotplugged devices can affect the existing ones by moving their BARs. The
PCI subsystem will inform the NVME driver about this by invoking the
.rescan_prepare() and .rescan_done() hooks, so the BARs can by re-mapped.

Tested under the "randrw" mode of the fio tool. Before the hotplugging:

  % sudo cat /proc/iomem
  ...
                3fe800000000-3fe8007fffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:18
                    3fe800000000-3fe8000fffff : 0020:18:00.0
                      3fe800000000-3fe8000fffff : nvme
                    3fe800100000-3fe80017ffff : 0020:18:00.0
  ...

, then another NVME drive was hot-added, so BARs of the 0020:18:00.0 are
moved:

  % sudo cat /proc/iomem
    ...
                3fe800000000-3fe800ffffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:10
                    3fe800000000-3fe800003fff : 0020:10:00.0
                      3fe800000000-3fe800003fff : nvme
                    3fe800010000-3fe80001ffff : 0020:10:00.0
                  3fe800800000-3fe800ffffff : PCI Bus 0020:18
                    3fe800800000-3fe8008fffff : 0020:18:00.0
                      3fe800800000-3fe8008fffff : nvme
                    3fe800900000-3fe80097ffff : 0020:18:00.0
    ...

During the rescanning, both READ and WRITE speeds drop to zero for a while
due to driver's pause, then restore.

Cc: linux-nvme at lists.infradead.org
Cc: Christoph Hellwig <hch at lst.de>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko at yadro.com>
---
 drivers/nvme/host/pci.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index db160cee42ad..a805d80082ca 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1645,7 +1645,7 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
-	if (size <= dev->bar_mapped_size)
+	if (dev->bar && size <= dev->bar_mapped_size)
 		return 0;
 	if (size > pci_resource_len(pdev, 0))
 		return -ENOMEM;
@@ -2980,6 +2980,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
 	flush_work(&dev->ctrl.reset_work);
 }
 
+static void nvme_rescan_prepare(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_disable(dev, false);
+	nvme_dev_unmap(dev);
+	dev->bar = NULL;
+}
+
+static void nvme_rescan_done(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_map(dev);
+	nvme_reset_ctrl_sync(&dev->ctrl);
+}
+
 static const struct pci_error_handlers nvme_err_handler = {
 	.error_detected	= nvme_error_detected,
 	.slot_reset	= nvme_slot_reset,
@@ -3049,6 +3066,8 @@ static struct pci_driver nvme_driver = {
 #endif
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
+	.rescan_prepare	= nvme_rescan_prepare,
+	.rescan_done	= nvme_rescan_done,
 };
 
 static int __init nvme_init(void)
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 21/23] nvme-pci: Handle movable BARs
@ 2019-08-16 16:50   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:50 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Sergey Miroshnichenko, Bjorn Helgaas, Christoph Hellwig,
	linux-nvme, linux

Hotplugged devices can affect the existing ones by moving their BARs. The
PCI subsystem will inform the NVME driver about this by invoking the
.rescan_prepare() and .rescan_done() hooks, so the BARs can by re-mapped.

Tested under the "randrw" mode of the fio tool. Before the hotplugging:

  % sudo cat /proc/iomem
  ...
                3fe800000000-3fe8007fffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:18
                    3fe800000000-3fe8000fffff : 0020:18:00.0
                      3fe800000000-3fe8000fffff : nvme
                    3fe800100000-3fe80017ffff : 0020:18:00.0
  ...

, then another NVME drive was hot-added, so BARs of the 0020:18:00.0 are
moved:

  % sudo cat /proc/iomem
    ...
                3fe800000000-3fe800ffffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:10
                    3fe800000000-3fe800003fff : 0020:10:00.0
                      3fe800000000-3fe800003fff : nvme
                    3fe800010000-3fe80001ffff : 0020:10:00.0
                  3fe800800000-3fe800ffffff : PCI Bus 0020:18
                    3fe800800000-3fe8008fffff : 0020:18:00.0
                      3fe800800000-3fe8008fffff : nvme
                    3fe800900000-3fe80097ffff : 0020:18:00.0
    ...

During the rescanning, both READ and WRITE speeds drop to zero for a while
due to driver's pause, then restore.

Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/nvme/host/pci.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index db160cee42ad..a805d80082ca 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1645,7 +1645,7 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
-	if (size <= dev->bar_mapped_size)
+	if (dev->bar && size <= dev->bar_mapped_size)
 		return 0;
 	if (size > pci_resource_len(pdev, 0))
 		return -ENOMEM;
@@ -2980,6 +2980,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
 	flush_work(&dev->ctrl.reset_work);
 }
 
+static void nvme_rescan_prepare(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_disable(dev, false);
+	nvme_dev_unmap(dev);
+	dev->bar = NULL;
+}
+
+static void nvme_rescan_done(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_map(dev);
+	nvme_reset_ctrl_sync(&dev->ctrl);
+}
+
 static const struct pci_error_handlers nvme_err_handler = {
 	.error_detected	= nvme_error_detected,
 	.slot_reset	= nvme_slot_reset,
@@ -3049,6 +3066,8 @@ static struct pci_driver nvme_driver = {
 #endif
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
+	.rescan_prepare	= nvme_rescan_prepare,
+	.rescan_done	= nvme_rescan_done,
 };
 
 static int __init nvme_init(void)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 22/23] PCI/portdrv: Declare support of movable BARs
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:51   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:51 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Switch's BARs are not used by the portdrv driver, but they are still
considered as immovable until the .rescan_prepare() and .rescan_done()
hooks are added. Add these hooks to increase chances to allocate new BARs.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pcie/portdrv_pci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 0a87091a0800..9dbddc7faaa7 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -197,6 +197,14 @@ static const struct pci_error_handlers pcie_portdrv_err_handler = {
 	.resume = pcie_portdrv_err_resume,
 };
 
+static void pcie_portdrv_rescan_prepare(struct pci_dev *pdev)
+{
+}
+
+static void pcie_portdrv_rescan_done(struct pci_dev *pdev)
+{
+}
+
 static struct pci_driver pcie_portdriver = {
 	.name		= "pcieport",
 	.id_table	= &port_pci_ids[0],
@@ -207,6 +215,9 @@ static struct pci_driver pcie_portdriver = {
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
+	.rescan_prepare	= pcie_portdrv_rescan_prepare,
+	.rescan_done	= pcie_portdrv_rescan_done,
+
 	.driver.pm	= PCIE_PORTDRV_PM_OPS,
 };
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 22/23] PCI/portdrv: Declare support of movable BARs
@ 2019-08-16 16:51   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:51 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Switch's BARs are not used by the portdrv driver, but they are still
considered as immovable until the .rescan_prepare() and .rescan_done()
hooks are added. Add these hooks to increase chances to allocate new BARs.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pcie/portdrv_pci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 0a87091a0800..9dbddc7faaa7 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -197,6 +197,14 @@ static const struct pci_error_handlers pcie_portdrv_err_handler = {
 	.resume = pcie_portdrv_err_resume,
 };
 
+static void pcie_portdrv_rescan_prepare(struct pci_dev *pdev)
+{
+}
+
+static void pcie_portdrv_rescan_done(struct pci_dev *pdev)
+{
+}
+
 static struct pci_driver pcie_portdriver = {
 	.name		= "pcieport",
 	.id_table	= &port_pci_ids[0],
@@ -207,6 +215,9 @@ static struct pci_driver pcie_portdriver = {
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
+	.rescan_prepare	= pcie_portdrv_rescan_prepare,
+	.rescan_done	= pcie_portdrv_rescan_done,
+
 	.driver.pm	= PCIE_PORTDRV_PM_OPS,
 };
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 23/23] PCI: pciehp: movable BARs: Trigger a domain rescan on hp events
  2019-08-16 16:50 ` Sergey Miroshnichenko
@ 2019-08-16 16:51   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:51 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko, Lukas Wunner

With movable BARs, adding a hotplugged device is not local to its bridge
anymore, but it affects the whole domain: BARs, bridge windows and bus
numbers can be substantially rearranged. So instead of trying to fit the
new devices into preallocated reserved gaps, initiate a full domain rescan.

The pci_rescan_bus() covers all the operations of the replaced functions:
 - assigning new bus numbers, as the pci_hp_add_bridge() does it;
 - allocating BARs (pci_assign_unassigned_bridge_resources());
 - cofiguring MPS settings (pcie_bus_configure_settings());
 - binding devices to their drivers (pci_bus_add_devices()).

CC: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/hotplug/pciehp_pci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index d17f3bf36f70..66c4e6d88fe3 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -58,6 +58,11 @@ int pciehp_configure_device(struct controller *ctrl)
 		goto out;
 	}
 
+	if (pci_movable_bars_enabled()) {
+		pci_rescan_bus(parent);
+		goto out;
+	}
+
 	for_each_pci_bridge(dev, parent)
 		pci_hp_add_bridge(dev);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5 23/23] PCI: pciehp: movable BARs: Trigger a domain rescan on hp events
@ 2019-08-16 16:51   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-08-16 16:51 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Sergey Miroshnichenko, Lukas Wunner, Bjorn Helgaas, linux

With movable BARs, adding a hotplugged device is not local to its bridge
anymore, but it affects the whole domain: BARs, bridge windows and bus
numbers can be substantially rearranged. So instead of trying to fit the
new devices into preallocated reserved gaps, initiate a full domain rescan.

The pci_rescan_bus() covers all the operations of the replaced functions:
 - assigning new bus numbers, as the pci_hp_add_bridge() does it;
 - allocating BARs (pci_assign_unassigned_bridge_resources());
 - cofiguring MPS settings (pcie_bus_configure_settings());
 - binding devices to their drivers (pci_bus_add_devices()).

CC: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/hotplug/pciehp_pci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index d17f3bf36f70..66c4e6d88fe3 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -58,6 +58,11 @@ int pciehp_configure_device(struct controller *ctrl)
 		goto out;
 	}
 
+	if (pci_movable_bars_enabled()) {
+		pci_rescan_bus(parent);
+		goto out;
+	}
+
 	for_each_pci_bridge(dev, parent)
 		pci_hp_add_bridge(dev);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
  2019-08-16 16:50   ` Sergey Miroshnichenko
@ 2019-08-22 12:37     ` Marta Rybczynska
  -1 siblings, 0 replies; 78+ messages in thread
From: Marta Rybczynska @ 2019-08-22 12:37 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: linux-pci, linuxppc-dev, Bjorn Helgaas, linux, Srinath Mannam



----- On 16 Aug, 2019, at 18:50, Sergey Miroshnichenko s.miroshnichenko@yadro.com wrote:

> This is a yet another approach to fix an old [1-2] concurrency issue, when:
> - two or more devices are being hot-added into a bridge which was
>   initially empty;
> - a bridge with two or more devices is being hot-added;
> - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
> 
> The problem is that a bridge is reported as enabled before the MEM/IO bits
> are actually written to the PCI_COMMAND register, so another driver thread
> starts memory requests through the not-yet-enabled bridge:
> 
> CPU0                                        CPU1
> 
> pci_enable_device_mem()                     pci_enable_device_mem()
>   pci_enable_bridge()                         pci_enable_bridge()
>     pci_is_enabled()
>       return false;
>     atomic_inc_return(enable_cnt)
>     Start actual enabling the bridge
>     ...                                         pci_is_enabled()
>     ...                                           return true;
>     ...                                     Start memory requests <-- FAIL
>     ...
>     Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
> while enabling upstream bridges"), but adding a per-device mutexes and
> preventing the dev->enable_cnt from from incrementing early.
> 
> CC: Srinath Mannam <srinath.mannam@broadcom.com>
> CC: Marta Rybczynska <mrybczyn@kalray.eu>
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> 
> [1]
> https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>    [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
> 
> [2]
> https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
>    [RFC PATCH] nvme: avoid race-conditions when enabling devices
> ---
> drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
> drivers/pci/probe.c |  1 +
> include/linux/pci.h |  1 +
> 3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1b27b5af3d55..e7f8c354e644 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
> 	struct pci_dev *bridge;
> 	int retval;
> 
> +	mutex_lock(&dev->enable_mutex);
> +
> 	bridge = pci_upstream_bridge(dev);
> 	if (bridge)
> 		pci_enable_bridge(bridge);
> @@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
> 	if (pci_is_enabled(dev)) {
> 		if (!dev->is_busmaster)
> 			pci_set_master(dev);
> +		mutex_unlock(&dev->enable_mutex);
> 		return;
> 	}
> 

This code is used by numerous drivers and when we've seen that issue I was wondering
if there are some use-cases when this (or pci_disable_device) is called with interrupts
disabled. It seems that it shouldn't be, but a BUG_ON or error when someone calls
it this way would be helpful when debugging.

Marta

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
@ 2019-08-22 12:37     ` Marta Rybczynska
  0 siblings, 0 replies; 78+ messages in thread
From: Marta Rybczynska @ 2019-08-22 12:37 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: linux-pci, Srinath Mannam, Bjorn Helgaas, linuxppc-dev, linux



----- On 16 Aug, 2019, at 18:50, Sergey Miroshnichenko s.miroshnichenko@yadro.com wrote:

> This is a yet another approach to fix an old [1-2] concurrency issue, when:
> - two or more devices are being hot-added into a bridge which was
>   initially empty;
> - a bridge with two or more devices is being hot-added;
> - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
> 
> The problem is that a bridge is reported as enabled before the MEM/IO bits
> are actually written to the PCI_COMMAND register, so another driver thread
> starts memory requests through the not-yet-enabled bridge:
> 
> CPU0                                        CPU1
> 
> pci_enable_device_mem()                     pci_enable_device_mem()
>   pci_enable_bridge()                         pci_enable_bridge()
>     pci_is_enabled()
>       return false;
>     atomic_inc_return(enable_cnt)
>     Start actual enabling the bridge
>     ...                                         pci_is_enabled()
>     ...                                           return true;
>     ...                                     Start memory requests <-- FAIL
>     ...
>     Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
> while enabling upstream bridges"), but adding a per-device mutexes and
> preventing the dev->enable_cnt from from incrementing early.
> 
> CC: Srinath Mannam <srinath.mannam@broadcom.com>
> CC: Marta Rybczynska <mrybczyn@kalray.eu>
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> 
> [1]
> https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>    [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
> 
> [2]
> https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
>    [RFC PATCH] nvme: avoid race-conditions when enabling devices
> ---
> drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
> drivers/pci/probe.c |  1 +
> include/linux/pci.h |  1 +
> 3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1b27b5af3d55..e7f8c354e644 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
> 	struct pci_dev *bridge;
> 	int retval;
> 
> +	mutex_lock(&dev->enable_mutex);
> +
> 	bridge = pci_upstream_bridge(dev);
> 	if (bridge)
> 		pci_enable_bridge(bridge);
> @@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
> 	if (pci_is_enabled(dev)) {
> 		if (!dev->is_busmaster)
> 			pci_set_master(dev);
> +		mutex_unlock(&dev->enable_mutex);
> 		return;
> 	}
> 

This code is used by numerous drivers and when we've seen that issue I was wondering
if there are some use-cases when this (or pci_disable_device) is called with interrupts
disabled. It seems that it shouldn't be, but a BUG_ON or error when someone calls
it this way would be helpful when debugging.

Marta

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 18/23] powerpc/pci: Handle BAR movement
  2019-08-16 16:50   ` Sergey Miroshnichenko
@ 2019-09-04  5:37     ` Oliver O'Halloran
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver O'Halloran @ 2019-09-04  5:37 UTC (permalink / raw)
  To: Sergey Miroshnichenko, linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux

On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
> Add pcibios_rescan_prepare()/_done() hooks for the powerpc platform. Now if
> the device's driver supports movable BARs, pcibios_rescan_prepare() will be
> called after the device is stopped, and pcibios_rescan_done() - before it
> resumes. There are no memory requests to this device between the hooks, so
> it it safe to rebuild the EEH address cache during that.
> 
> CC: Oliver O'Halloran <oohall@gmail.com>
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  arch/powerpc/kernel/pci-hotplug.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 0b0cf8168b47..18cf13bba228 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -144,3 +144,13 @@ void pci_hp_add_devices(struct pci_bus *bus)
>  	pcibios_finish_adding_to_bus(bus);
>  }
>  EXPORT_SYMBOL_GPL(pci_hp_add_devices);
> +
> +void pcibios_rescan_prepare(struct pci_dev *pdev)
> +{
> +	eeh_addr_cache_rmv_dev(pdev);
> +}
> +
> +void pcibios_rescan_done(struct pci_dev *pdev)
> +{
> +	eeh_addr_cache_insert_dev(pdev);
> +}

Is this actually sufficent? The PE number for a device is largely
determined by the location of the MMIO BARs. If you move a BAR far
enough the PE number stored in the eeh_pe would need to be updated as
well.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 18/23] powerpc/pci: Handle BAR movement
@ 2019-09-04  5:37     ` Oliver O'Halloran
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver O'Halloran @ 2019-09-04  5:37 UTC (permalink / raw)
  To: Sergey Miroshnichenko, linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux

On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
> Add pcibios_rescan_prepare()/_done() hooks for the powerpc platform. Now if
> the device's driver supports movable BARs, pcibios_rescan_prepare() will be
> called after the device is stopped, and pcibios_rescan_done() - before it
> resumes. There are no memory requests to this device between the hooks, so
> it it safe to rebuild the EEH address cache during that.
> 
> CC: Oliver O'Halloran <oohall@gmail.com>
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  arch/powerpc/kernel/pci-hotplug.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 0b0cf8168b47..18cf13bba228 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -144,3 +144,13 @@ void pci_hp_add_devices(struct pci_bus *bus)
>  	pcibios_finish_adding_to_bus(bus);
>  }
>  EXPORT_SYMBOL_GPL(pci_hp_add_devices);
> +
> +void pcibios_rescan_prepare(struct pci_dev *pdev)
> +{
> +	eeh_addr_cache_rmv_dev(pdev);
> +}
> +
> +void pcibios_rescan_done(struct pci_dev *pdev)
> +{
> +	eeh_addr_cache_insert_dev(pdev);
> +}

Is this actually sufficent? The PE number for a device is largely
determined by the location of the MMIO BARs. If you move a BAR far
enough the PE number stored in the eeh_pe would need to be updated as
well.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 16/23] PCI: hotplug: movable BARs: Don't reserve IO/mem bus space
  2019-08-16 16:50   ` Sergey Miroshnichenko
@ 2019-09-04  5:42     ` Oliver O'Halloran
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver O'Halloran @ 2019-09-04  5:42 UTC (permalink / raw)
  To: Sergey Miroshnichenko, linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux

On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
> A hotplugged bridge with many hotplug-capable ports may request
> reserving more IO space than the machine has. This could be overridden
> with the "hpiosize=" kernel argument though.
> 
> But when BARs are movable, there are no need to reserve space anymore:
> new BARs are allocated not from reserved gaps, but via rearranging the
> existing BARs. Requesting a precise amount of space for bridge windows
> increases the chances of adding the new bridge successfully.

It wouldn't hurt to reserve some memory space to prevent unnecessary
BAR shuffling at runtime. If it turns out that we need more space then
we can always fall back to re-assigning the whole tree.

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/setup-bus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index c7b7e30c6284..7d64ec8e7088 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1287,7 +1287,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
>  
>  	case PCI_HEADER_TYPE_BRIDGE:
>  		pci_bridge_check_ranges(bus);
> -		if (bus->self->is_hotplug_bridge) {
> +		if (bus->self->is_hotplug_bridge && !pci_movable_bars_enabled()) {
>  			additional_io_size  = pci_hotplug_io_size;
>  			additional_mem_size = pci_hotplug_mem_size;
>  		}


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 16/23] PCI: hotplug: movable BARs: Don't reserve IO/mem bus space
@ 2019-09-04  5:42     ` Oliver O'Halloran
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver O'Halloran @ 2019-09-04  5:42 UTC (permalink / raw)
  To: Sergey Miroshnichenko, linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux

On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
> A hotplugged bridge with many hotplug-capable ports may request
> reserving more IO space than the machine has. This could be overridden
> with the "hpiosize=" kernel argument though.
> 
> But when BARs are movable, there are no need to reserve space anymore:
> new BARs are allocated not from reserved gaps, but via rearranging the
> existing BARs. Requesting a precise amount of space for bridge windows
> increases the chances of adding the new bridge successfully.

It wouldn't hurt to reserve some memory space to prevent unnecessary
BAR shuffling at runtime. If it turns out that we need more space then
we can always fall back to re-assigning the whole tree.

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/setup-bus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index c7b7e30c6284..7d64ec8e7088 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1287,7 +1287,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
>  
>  	case PCI_HEADER_TYPE_BRIDGE:
>  		pci_bridge_check_ranges(bus);
> -		if (bus->self->is_hotplug_bridge) {
> +		if (bus->self->is_hotplug_bridge && !pci_movable_bars_enabled()) {
>  			additional_io_size  = pci_hotplug_io_size;
>  			additional_mem_size = pci_hotplug_mem_size;
>  		}


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 16/23] PCI: hotplug: movable BARs: Don't reserve IO/mem bus space
  2019-09-04  5:42     ` Oliver O'Halloran
@ 2019-09-04 11:22       ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-09-04 11:22 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linux-pci, linuxppc-dev, Bjorn Helgaas, linux

On 9/4/19 8:42 AM, Oliver O'Halloran wrote:
> On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
>> A hotplugged bridge with many hotplug-capable ports may request
>> reserving more IO space than the machine has. This could be overridden
>> with the "hpiosize=" kernel argument though.
>>
>> But when BARs are movable, there are no need to reserve space anymore:
>> new BARs are allocated not from reserved gaps, but via rearranging the
>> existing BARs. Requesting a precise amount of space for bridge windows
>> increases the chances of adding the new bridge successfully.
> 
> It wouldn't hurt to reserve some memory space to prevent unnecessary
> BAR shuffling at runtime. If it turns out that we need more space then
> we can always fall back to re-assigning the whole tree.
> 

Hi Oliver,

Thank you for your comments!

We had an issue on a x86_64 PC with a small amount of IO space: after
hotplugging an empty bridge of 32 ports even a DEFAULT_HOTPLUG_IO_SIZE
(which is 256) was enough to exhaust the space. So another patch of
this series ("Don't allow added devices to steal resources") had
disabled the BAR allocating for this bridge. It took some time for me
to guess that "hpiosize=0" can solve that.

For MEM and MEM64 spaces it will be harder to reproduce the same, but
there can be a similar problem when fitting between two immovable BARs.

To implement a fallback it would need to add some flag indicating that
allocating this bridge with reserved spaces has failed, so its windows
should be recalculated without reserved spaces - and try again. Maybe
even two types of retrials: with and without the full re-assignment.
We've tried to avoid adding execution paths and code complicatedness.

Serge

>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/setup-bus.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> index c7b7e30c6284..7d64ec8e7088 100644
>> --- a/drivers/pci/setup-bus.c
>> +++ b/drivers/pci/setup-bus.c
>> @@ -1287,7 +1287,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
>>  
>>  	case PCI_HEADER_TYPE_BRIDGE:
>>  		pci_bridge_check_ranges(bus);
>> -		if (bus->self->is_hotplug_bridge) {
>> +		if (bus->self->is_hotplug_bridge && !pci_movable_bars_enabled()) {
>>  			additional_io_size  = pci_hotplug_io_size;
>>  			additional_mem_size = pci_hotplug_mem_size;
>>  		}
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 16/23] PCI: hotplug: movable BARs: Don't reserve IO/mem bus space
@ 2019-09-04 11:22       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-09-04 11:22 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linux-pci, Bjorn Helgaas, linuxppc-dev, linux

On 9/4/19 8:42 AM, Oliver O'Halloran wrote:
> On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
>> A hotplugged bridge with many hotplug-capable ports may request
>> reserving more IO space than the machine has. This could be overridden
>> with the "hpiosize=" kernel argument though.
>>
>> But when BARs are movable, there are no need to reserve space anymore:
>> new BARs are allocated not from reserved gaps, but via rearranging the
>> existing BARs. Requesting a precise amount of space for bridge windows
>> increases the chances of adding the new bridge successfully.
> 
> It wouldn't hurt to reserve some memory space to prevent unnecessary
> BAR shuffling at runtime. If it turns out that we need more space then
> we can always fall back to re-assigning the whole tree.
> 

Hi Oliver,

Thank you for your comments!

We had an issue on a x86_64 PC with a small amount of IO space: after
hotplugging an empty bridge of 32 ports even a DEFAULT_HOTPLUG_IO_SIZE
(which is 256) was enough to exhaust the space. So another patch of
this series ("Don't allow added devices to steal resources") had
disabled the BAR allocating for this bridge. It took some time for me
to guess that "hpiosize=0" can solve that.

For MEM and MEM64 spaces it will be harder to reproduce the same, but
there can be a similar problem when fitting between two immovable BARs.

To implement a fallback it would need to add some flag indicating that
allocating this bridge with reserved spaces has failed, so its windows
should be recalculated without reserved spaces - and try again. Maybe
even two types of retrials: with and without the full re-assignment.
We've tried to avoid adding execution paths and code complicatedness.

Serge

>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/setup-bus.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> index c7b7e30c6284..7d64ec8e7088 100644
>> --- a/drivers/pci/setup-bus.c
>> +++ b/drivers/pci/setup-bus.c
>> @@ -1287,7 +1287,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
>>  
>>  	case PCI_HEADER_TYPE_BRIDGE:
>>  		pci_bridge_check_ranges(bus);
>> -		if (bus->self->is_hotplug_bridge) {
>> +		if (bus->self->is_hotplug_bridge && !pci_movable_bars_enabled()) {
>>  			additional_io_size  = pci_hotplug_io_size;
>>  			additional_mem_size = pci_hotplug_mem_size;
>>  		}
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 18/23] powerpc/pci: Handle BAR movement
  2019-09-04  5:37     ` Oliver O'Halloran
@ 2019-09-06 16:24       ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-09-06 16:24 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linux-pci, linuxppc-dev, Bjorn Helgaas, linux


[-- Attachment #1.1: Type: text/plain, Size: 2610 bytes --]

Hi Oliver,

On 9/4/19 8:37 AM, Oliver O'Halloran wrote:
> On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
>> Add pcibios_rescan_prepare()/_done() hooks for the powerpc platform. Now if
>> the device's driver supports movable BARs, pcibios_rescan_prepare() will be
>> called after the device is stopped, and pcibios_rescan_done() - before it
>> resumes. There are no memory requests to this device between the hooks, so
>> it it safe to rebuild the EEH address cache during that.
>>
>> CC: Oliver O'Halloran <oohall@gmail.com>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  arch/powerpc/kernel/pci-hotplug.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
>> index 0b0cf8168b47..18cf13bba228 100644
>> --- a/arch/powerpc/kernel/pci-hotplug.c
>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>> @@ -144,3 +144,13 @@ void pci_hp_add_devices(struct pci_bus *bus)
>>  	pcibios_finish_adding_to_bus(bus);
>>  }
>>  EXPORT_SYMBOL_GPL(pci_hp_add_devices);
>> +
>> +void pcibios_rescan_prepare(struct pci_dev *pdev)
>> +{
>> +	eeh_addr_cache_rmv_dev(pdev);
>> +}
>> +
>> +void pcibios_rescan_done(struct pci_dev *pdev)
>> +{
>> +	eeh_addr_cache_insert_dev(pdev);
>> +}
> 
> Is this actually sufficent? The PE number for a device is largely
> determined by the location of the MMIO BARs. If you move a BAR far
> enough the PE number stored in the eeh_pe would need to be updated as
> well.
> 

Thanks for the hint! I've checked on our PowerNV: for bridges with MEM
only it allocates PE numbers starting from 0xff down, and when there
are MEM64 - starting from 0 up, one PE number per 4GiB.

PEs are allocated during call to pnv_pci_setup_bridge(), and the I've
added invocation of pci_setup_bridge() after a hotplug event in the
"Recalculate all bridge windows during rescan" patch of this series.

Currently, if a bus already has a PE, pnv_ioda_setup_bus_PE() takes it
and returns. I can see two ways to change it, both are not difficult to
implement:

 a.1) check if MEM64 BARs appeared below the bus - allocate and assign
      a new master PE with required number of slave PEs;

 a.2) if the bus now has more MEM64 than before - check if more slave
      PEs must be reserved;

 b) release all the PEs before a PCI rescan and allocate+assign them
    again after - with this approach the "Hook up the writes to
    PCI_SECONDARY_BUS register" patch may be eliminated.

Do you find any of these suitable?

Serge


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 18/23] powerpc/pci: Handle BAR movement
@ 2019-09-06 16:24       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-09-06 16:24 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linux-pci, Bjorn Helgaas, linuxppc-dev, linux


[-- Attachment #1.1: Type: text/plain, Size: 2610 bytes --]

Hi Oliver,

On 9/4/19 8:37 AM, Oliver O'Halloran wrote:
> On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
>> Add pcibios_rescan_prepare()/_done() hooks for the powerpc platform. Now if
>> the device's driver supports movable BARs, pcibios_rescan_prepare() will be
>> called after the device is stopped, and pcibios_rescan_done() - before it
>> resumes. There are no memory requests to this device between the hooks, so
>> it it safe to rebuild the EEH address cache during that.
>>
>> CC: Oliver O'Halloran <oohall@gmail.com>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  arch/powerpc/kernel/pci-hotplug.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
>> index 0b0cf8168b47..18cf13bba228 100644
>> --- a/arch/powerpc/kernel/pci-hotplug.c
>> +++ b/arch/powerpc/kernel/pci-hotplug.c
>> @@ -144,3 +144,13 @@ void pci_hp_add_devices(struct pci_bus *bus)
>>  	pcibios_finish_adding_to_bus(bus);
>>  }
>>  EXPORT_SYMBOL_GPL(pci_hp_add_devices);
>> +
>> +void pcibios_rescan_prepare(struct pci_dev *pdev)
>> +{
>> +	eeh_addr_cache_rmv_dev(pdev);
>> +}
>> +
>> +void pcibios_rescan_done(struct pci_dev *pdev)
>> +{
>> +	eeh_addr_cache_insert_dev(pdev);
>> +}
> 
> Is this actually sufficent? The PE number for a device is largely
> determined by the location of the MMIO BARs. If you move a BAR far
> enough the PE number stored in the eeh_pe would need to be updated as
> well.
> 

Thanks for the hint! I've checked on our PowerNV: for bridges with MEM
only it allocates PE numbers starting from 0xff down, and when there
are MEM64 - starting from 0 up, one PE number per 4GiB.

PEs are allocated during call to pnv_pci_setup_bridge(), and the I've
added invocation of pci_setup_bridge() after a hotplug event in the
"Recalculate all bridge windows during rescan" patch of this series.

Currently, if a bus already has a PE, pnv_ioda_setup_bus_PE() takes it
and returns. I can see two ways to change it, both are not difficult to
implement:

 a.1) check if MEM64 BARs appeared below the bus - allocate and assign
      a new master PE with required number of slave PEs;

 a.2) if the bus now has more MEM64 than before - check if more slave
      PEs must be reserved;

 b) release all the PEs before a PCI rescan and allocate+assign them
    again after - with this approach the "Hook up the writes to
    PCI_SECONDARY_BUS register" patch may be eliminated.

Do you find any of these suitable?

Serge


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 18/23] powerpc/pci: Handle BAR movement
  2019-09-06 16:24       ` Sergey Miroshnichenko
@ 2019-09-09 14:02         ` Oliver O'Halloran
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver O'Halloran @ 2019-09-09 14:02 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, Bjorn Helgaas, linux

On Sat, Sep 7, 2019 at 2:25 AM Sergey Miroshnichenko
<s.miroshnichenko@yadro.com> wrote:
>
> Hi Oliver,
>
> On 9/4/19 8:37 AM, Oliver O'Halloran wrote:
> > On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
> >> Add pcibios_rescan_prepare()/_done() hooks for the powerpc platform. Now if
> >> the device's driver supports movable BARs, pcibios_rescan_prepare() will be
> >> called after the device is stopped, and pcibios_rescan_done() - before it
> >> resumes. There are no memory requests to this device between the hooks, so
> >> it it safe to rebuild the EEH address cache during that.
> >>
> >> CC: Oliver O'Halloran <oohall@gmail.com>
> >> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> >> ---
> >>  arch/powerpc/kernel/pci-hotplug.c | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> >> index 0b0cf8168b47..18cf13bba228 100644
> >> --- a/arch/powerpc/kernel/pci-hotplug.c
> >> +++ b/arch/powerpc/kernel/pci-hotplug.c
> >> @@ -144,3 +144,13 @@ void pci_hp_add_devices(struct pci_bus *bus)
> >>      pcibios_finish_adding_to_bus(bus);
> >>  }
> >>  EXPORT_SYMBOL_GPL(pci_hp_add_devices);
> >> +
> >> +void pcibios_rescan_prepare(struct pci_dev *pdev)
> >> +{
> >> +    eeh_addr_cache_rmv_dev(pdev);
> >> +}
> >> +
> >> +void pcibios_rescan_done(struct pci_dev *pdev)
> >> +{
> >> +    eeh_addr_cache_insert_dev(pdev);
> >> +}
> >
> > Is this actually sufficent? The PE number for a device is largely
> > determined by the location of the MMIO BARs. If you move a BAR far
> > enough the PE number stored in the eeh_pe would need to be updated as
> > well.
> >
>
> Thanks for the hint! I've checked on our PowerNV: for bridges with MEM
> only it allocates PE numbers starting from 0xff down, and when there
> are MEM64 - starting from 0 up, one PE number per 4GiB.
>
> PEs are allocated during call to pnv_pci_setup_bridge(), and the I've
> added invocation of pci_setup_bridge() after a hotplug event in the
> "Recalculate all bridge windows during rescan" patch of this series.

Sort of.

On PHB3 both the 32bit and the 64bit MMIO windows are split into 256
segments each of which is mapped to a PE number. For the 32bit space
there's a remapping table in hardware that allows arbitrary mapping of
segments to PE numbers, but in the 64bit space the mapping is fixed
with the first segment being PE0, etc. If there's a 64 bit BAR under a
bridge the PE is really "allocated" during the BAR assignment process,
and the setup_bridge() step sets up the EEH state based on that.

It's worth pointing out that this is why the 64bit window is usually
4GB. Bridge windows need to be aligned to a segment boundary to ensure
the devices under them are placed into a unique PE.

> Currently, if a bus already has a PE, pnv_ioda_setup_bus_PE() takes it
> and returns. I can see two ways to change it, both are not difficult to
> implement:
>
>  a.1) check if MEM64 BARs appeared below the bus - allocate and assign
>       a new master PE with required number of slave PEs;
>
>  a.2) if the bus now has more MEM64 than before - check if more slave
>       PEs must be reserved;
>
>  b) release all the PEs before a PCI rescan and allocate+assign them
>     again after - with this approach the "Hook up the writes to
>     PCI_SECONDARY_BUS register" patch may be eliminated.
>
> Do you find any of these suitable?

I'm not sure a) would work, but even if it does b) is preferable.
There's a lot of strangeness in the powerpc PCI code as-is without
adding extra code paths to deal with. Keeping what happens at hotplug
consistent with what happens at boot will help keep things sane.

FYI in the next few days I'm going to post a series that rips out the
use of pci_dn in powernv and the generic parts of EEH (pseries still
uses it). Assuming Bjorn isn't picking this up for 5.4 you might want
to wait for that before getting too deep into this.

Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 18/23] powerpc/pci: Handle BAR movement
@ 2019-09-09 14:02         ` Oliver O'Halloran
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver O'Halloran @ 2019-09-09 14:02 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, Bjorn Helgaas, linuxppc-dev, linux

On Sat, Sep 7, 2019 at 2:25 AM Sergey Miroshnichenko
<s.miroshnichenko@yadro.com> wrote:
>
> Hi Oliver,
>
> On 9/4/19 8:37 AM, Oliver O'Halloran wrote:
> > On Fri, 2019-08-16 at 19:50 +0300, Sergey Miroshnichenko wrote:
> >> Add pcibios_rescan_prepare()/_done() hooks for the powerpc platform. Now if
> >> the device's driver supports movable BARs, pcibios_rescan_prepare() will be
> >> called after the device is stopped, and pcibios_rescan_done() - before it
> >> resumes. There are no memory requests to this device between the hooks, so
> >> it it safe to rebuild the EEH address cache during that.
> >>
> >> CC: Oliver O'Halloran <oohall@gmail.com>
> >> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> >> ---
> >>  arch/powerpc/kernel/pci-hotplug.c | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> >> index 0b0cf8168b47..18cf13bba228 100644
> >> --- a/arch/powerpc/kernel/pci-hotplug.c
> >> +++ b/arch/powerpc/kernel/pci-hotplug.c
> >> @@ -144,3 +144,13 @@ void pci_hp_add_devices(struct pci_bus *bus)
> >>      pcibios_finish_adding_to_bus(bus);
> >>  }
> >>  EXPORT_SYMBOL_GPL(pci_hp_add_devices);
> >> +
> >> +void pcibios_rescan_prepare(struct pci_dev *pdev)
> >> +{
> >> +    eeh_addr_cache_rmv_dev(pdev);
> >> +}
> >> +
> >> +void pcibios_rescan_done(struct pci_dev *pdev)
> >> +{
> >> +    eeh_addr_cache_insert_dev(pdev);
> >> +}
> >
> > Is this actually sufficent? The PE number for a device is largely
> > determined by the location of the MMIO BARs. If you move a BAR far
> > enough the PE number stored in the eeh_pe would need to be updated as
> > well.
> >
>
> Thanks for the hint! I've checked on our PowerNV: for bridges with MEM
> only it allocates PE numbers starting from 0xff down, and when there
> are MEM64 - starting from 0 up, one PE number per 4GiB.
>
> PEs are allocated during call to pnv_pci_setup_bridge(), and the I've
> added invocation of pci_setup_bridge() after a hotplug event in the
> "Recalculate all bridge windows during rescan" patch of this series.

Sort of.

On PHB3 both the 32bit and the 64bit MMIO windows are split into 256
segments each of which is mapped to a PE number. For the 32bit space
there's a remapping table in hardware that allows arbitrary mapping of
segments to PE numbers, but in the 64bit space the mapping is fixed
with the first segment being PE0, etc. If there's a 64 bit BAR under a
bridge the PE is really "allocated" during the BAR assignment process,
and the setup_bridge() step sets up the EEH state based on that.

It's worth pointing out that this is why the 64bit window is usually
4GB. Bridge windows need to be aligned to a segment boundary to ensure
the devices under them are placed into a unique PE.

> Currently, if a bus already has a PE, pnv_ioda_setup_bus_PE() takes it
> and returns. I can see two ways to change it, both are not difficult to
> implement:
>
>  a.1) check if MEM64 BARs appeared below the bus - allocate and assign
>       a new master PE with required number of slave PEs;
>
>  a.2) if the bus now has more MEM64 than before - check if more slave
>       PEs must be reserved;
>
>  b) release all the PEs before a PCI rescan and allocate+assign them
>     again after - with this approach the "Hook up the writes to
>     PCI_SECONDARY_BUS register" patch may be eliminated.
>
> Do you find any of these suitable?

I'm not sure a) would work, but even if it does b) is preferable.
There's a lot of strangeness in the powerpc PCI code as-is without
adding extra code paths to deal with. Keeping what happens at hotplug
consistent with what happens at boot will help keep things sane.

FYI in the next few days I'm going to post a series that rips out the
use of pci_dn in powernv and the generic parts of EEH (pseries still
uses it). Assuming Bjorn isn't picking this up for 5.4 you might want
to wait for that before getting too deep into this.

Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
  2019-08-16 16:50   ` Sergey Miroshnichenko
@ 2019-09-27 21:59     ` Bjorn Helgaas
  -1 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2019-09-27 21:59 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: linux-pci, linuxppc-dev, linux, Srinath Mannam, Marta Rybczynska

On Fri, Aug 16, 2019 at 07:50:39PM +0300, Sergey Miroshnichenko wrote:
> This is a yet another approach to fix an old [1-2] concurrency issue, when:
>  - two or more devices are being hot-added into a bridge which was
>    initially empty;
>  - a bridge with two or more devices is being hot-added;
>  - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
> 
> The problem is that a bridge is reported as enabled before the MEM/IO bits
> are actually written to the PCI_COMMAND register, so another driver thread
> starts memory requests through the not-yet-enabled bridge:
> 
>  CPU0                                        CPU1
> 
>  pci_enable_device_mem()                     pci_enable_device_mem()
>    pci_enable_bridge()                         pci_enable_bridge()
>      pci_is_enabled()
>        return false;
>      atomic_inc_return(enable_cnt)
>      Start actual enabling the bridge
>      ...                                         pci_is_enabled()
>      ...                                           return true;
>      ...                                     Start memory requests <-- FAIL
>      ...
>      Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
> while enabling upstream bridges"), but adding a per-device mutexes and
> preventing the dev->enable_cnt from from incrementing early.

This isn't directly related to the movable BARs functionality; is it
here because you see the problem more frequently when moving BARs?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
@ 2019-09-27 21:59     ` Bjorn Helgaas
  0 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2019-09-27 21:59 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: Marta Rybczynska, linux-pci, Srinath Mannam, linuxppc-dev, linux

On Fri, Aug 16, 2019 at 07:50:39PM +0300, Sergey Miroshnichenko wrote:
> This is a yet another approach to fix an old [1-2] concurrency issue, when:
>  - two or more devices are being hot-added into a bridge which was
>    initially empty;
>  - a bridge with two or more devices is being hot-added;
>  - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
> 
> The problem is that a bridge is reported as enabled before the MEM/IO bits
> are actually written to the PCI_COMMAND register, so another driver thread
> starts memory requests through the not-yet-enabled bridge:
> 
>  CPU0                                        CPU1
> 
>  pci_enable_device_mem()                     pci_enable_device_mem()
>    pci_enable_bridge()                         pci_enable_bridge()
>      pci_is_enabled()
>        return false;
>      atomic_inc_return(enable_cnt)
>      Start actual enabling the bridge
>      ...                                         pci_is_enabled()
>      ...                                           return true;
>      ...                                     Start memory requests <-- FAIL
>      ...
>      Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
> while enabling upstream bridges"), but adding a per-device mutexes and
> preventing the dev->enable_cnt from from incrementing early.

This isn't directly related to the movable BARs functionality; is it
here because you see the problem more frequently when moving BARs?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 02/23] PCI: Enable bridge's I/O and MEM access for hotplugged devices
  2019-08-16 16:50   ` Sergey Miroshnichenko
  (?)
@ 2019-09-27 22:01   ` Bjorn Helgaas
  -1 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2019-09-27 22:01 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Fri, Aug 16, 2019 at 07:50:40PM +0300, Sergey Miroshnichenko wrote:
> The PCI_COMMAND_IO and PCI_COMMAND_MEMORY bits of the bridge must be
> updated not only when enabling the bridge for the first time, but also if a
> hotplugged device requests these types of resources.

Yeah, this assumption that pci_is_enabled() means PCI_COMMAND_IO and
PCI_COMMAND_MEMORY are set correctly even though we may now need
*different* settings than when we incremented pdev->enable_cnt is
quite broken.

> Originally these bits were set by the pci_enable_device_flags() only, which
> exits early if the bridge is already pci_is_enabled(). So if the bridge was
> empty initially (an edge case), then hotplugged devices fail to IO/MEM.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/pci.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e7f8c354e644..61d951766087 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1652,6 +1652,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  		pci_enable_bridge(bridge);
>  
>  	if (pci_is_enabled(dev)) {
> +		int i, bars = 0;
> +
> +		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
> +			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
> +				bars |= (1 << i);
> +		}
> +		do_pci_enable_device(dev, bars);
> +
>  		if (!dev->is_busmaster)
>  			pci_set_master(dev);
>  		mutex_unlock(&dev->enable_mutex);
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
  2019-08-16 16:50   ` Sergey Miroshnichenko
@ 2019-09-27 22:02     ` Bjorn Helgaas
  -1 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2019-09-27 22:02 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: linux-pci, linuxppc-dev, linux, Sam Bobroff, Rajat Jain,
	Lukas Wunner, Oliver O'Halloran, David Laight

On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
> When hot-adding a device, the bridge may have windows not big enough (or
> fragmented too much) for newly requested BARs to fit in. And expanding
> these bridge windows may be impossible because blocked by "neighboring"
> BARs and bridge windows.
> 
> Still, it may be possible to allocate a memory region for new BARs with the
> following procedure:
> 
> 1) notify all the drivers which support movable BARs to pause and release
>    the BARs; the rest of the drivers are guaranteed that their devices will
>    not get BARs moved;
> 
> 2) release all the bridge windows except of root bridges;
> 
> 3) try to recalculate new bridge windows that will fit all the BAR types:
>    - fixed;
>    - immovable;
>    - movable;
>    - newly requested by hot-added devices;
> 
> 4) if the previous step fails, disable BARs for one of the hot-added
>    devices and retry from step 3;
> 
> 5) notify the drivers, so they remap BARs and resume.

You don't do the actual recalculation in *this* patch, but since you
mention the procedure here, are we confident that we never make things
worse?

It's possible that a hot-add will trigger this attempt to move things
around, and it's possible that we won't find space for the new device
even if we move things around.  But are we certain that every device
that worked *before* the hot-add will still work *afterwards*?

Much of the assignment was probably done by the BIOS using different
algorithms than Linux has, so I think there's some chance that the
BIOS did a better job and if we lose that BIOS assignment, we might
not be able to recreate it.

> This makes the prior reservation of memory by BIOS/bootloader/firmware not
> required anymore for the PCI hotplug.
> 
> Drivers indicate their support of movable BARs by implementing the new
> .rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
> device's activity must be paused during a rescan, and iounmap()+ioremap()
> must be applied to every used BAR.
> 
> The platform also may need to prepare to BAR movement, so new hooks added:
> pcibios_rescan_prepare(pci_dev) and pcibios_rescan_prepare(pci_dev).
> 
> This patch is a preparation for future patches with actual implementation,
> and for now it just does the following:
>  - declares the feature;
>  - defines pci_movable_bars_enabled(), pci_dev_movable_bars_supported(dev);
>  - invokes the .rescan_prepare() and .rescan_done() driver notifiers;
>  - declares and invokes the pcibios_rescan_prepare()/_done() hooks;
>  - adds the PCI_IMMOVABLE_BARS flag.
> 
> The feature is disabled by default (via PCI_IMMOVABLE_BARS) until the final
> patch of the series. It can be overridden per-arch using this flag or by
> the following command line option:
> 
>     pcie_movable_bars={ off | force }
> 
> CC: Sam Bobroff <sbobroff@linux.ibm.com>
> CC: Rajat Jain <rajatja@google.com>
> CC: Lukas Wunner <lukas@wunner.de>
> CC: Oliver O'Halloran <oohall@gmail.com>
> CC: David Laight <David.Laight@ACULAB.COM>
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  7 ++
>  drivers/pci/pci-driver.c                      |  2 +
>  drivers/pci/pci.c                             | 24 ++++++
>  drivers/pci/pci.h                             |  2 +
>  drivers/pci/probe.c                           | 86 ++++++++++++++++++-
>  include/linux/pci.h                           |  7 ++
>  6 files changed, 126 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 47d981a86e2f..e2274ee87a35 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3526,6 +3526,13 @@
>  		nomsi	Do not use MSI for native PCIe PME signaling (this makes
>  			all PCIe root ports use INTx for all services).
>  
> +	pcie_movable_bars=[PCIE]

This isn't a PCIe-specific feature, it's just a function of whether
drivers are smart enough, so we shouldn't tie it specifically to PCIe.
We could eventually do this for conventional PCI as well.

> +			Override the movable BARs support detection:
> +		off
> +			Disable even if supported by the platform
> +		force
> +			Enable even if not explicitly declared as supported

What's the need for "force"?  If it's possible, I think we should
enable this functionality all the time and just have a disable switch
in case we trip over cases where it doesn't work, e.g., something
like:

  pci=no_movable_bars

>  	pcmv=		[HW,PCMCIA] BadgePAD 4
>  
>  	pd_ignore_unused
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..d11909e79263 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1688,6 +1688,8 @@ static int __init pci_driver_init(void)
>  {
>  	int ret;
>  
> +	pci_add_flags(PCI_IMMOVABLE_BARS);
> +
>  	ret = bus_register(&pci_bus_type);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 61d951766087..3a504f58ac60 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
>  }
>  __setup("pcie_port_pm=", pcie_port_pm_setup);
>  
> +static bool pcie_movable_bars_off;
> +static bool pcie_movable_bars_force;
> +static int __init pcie_movable_bars_setup(char *str)
> +{
> +	if (!strcmp(str, "off"))
> +		pcie_movable_bars_off = true;
> +	else if (!strcmp(str, "force"))
> +		pcie_movable_bars_force = true;
> +	return 1;
> +}
> +__setup("pcie_movable_bars=", pcie_movable_bars_setup);
> +
> +bool pci_movable_bars_enabled(void)
> +{
> +	if (pcie_movable_bars_off)
> +		return false;
> +
> +	if (pcie_movable_bars_force)
> +		return true;
> +
> +	return !pci_has_flag(PCI_IMMOVABLE_BARS);
> +}
> +EXPORT_SYMBOL(pci_movable_bars_enabled);

I don't think this needs to be exported, since the only references I
see are from the PCI core.

>  /* Time to wait after a reset for device to become responsive */
>  #define PCIE_RESET_READY_POLL_MS 60000
>  
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index d22d1b807701..be7acc477c64 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -257,6 +257,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
>  void pci_reassigndev_resource_alignment(struct pci_dev *dev);
>  void pci_disable_bridge_window(struct pci_dev *dev);
>  
> +bool pci_dev_movable_bars_supported(struct pci_dev *dev);
> +
>  /* PCIe link information */
>  #define PCIE_SPEED2STR(speed) \
>  	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 2e58ece820e8..60e3b48d2251 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3406,6 +3406,74 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>  	return max;
>  }
>  
> +bool pci_dev_movable_bars_supported(struct pci_dev *dev)

This name suggests that movable BARs is a property of the device, but
it's mostly a property of the *driver*.  Most uses are in conjuction
with checking IORESOURCE_PCI_FIXED for some resource, so I think this
might read more naturally and simplify the callers slightly:

  bool pci_dev_movable(struct pci_dev *dev)
  {
    if (dev->driver && dev->driver->rescan_prepare)
      return true;

    if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
      return false;

    if (!dev->driver)
      return true;

    return false;
  }

  bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
  {
    if (res->flags & IORESOURCE_PCI_FIXED)
      return false;

    return pci_dev_movable(dev);
  }

I'm not sure why the PCI_CLASS_DISPLAY_VGA special case is there; can
you add a comment about why that's needed?  Obviously we can't move
the 0xa0000 legacy frame buffer because I think devices are allowed to
claim that region even if no BAR describes it.  But I would think
*other* BARs of VGA devices could be movable.

> +{
> +	if (!dev)
> +		return false;
> +
> +	if (dev->driver && dev->driver->rescan_prepare)
> +		return true;
> +
> +	if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
> +		return false;
> +
> +	return !dev->driver;
> +}
> +
> +void __weak pcibios_rescan_prepare(struct pci_dev *dev)
> +{
> +}
> +
> +void __weak pcibios_rescan_done(struct pci_dev *dev)
> +{
> +}

Can you add the pcibios_rescan_prepare() and pcibios_rescan_done()
stubs at the point where they're needed, i.e., where you add the
powerpc implementations?  We can't see the need for them at this point
in the series.

> +static void pci_bus_rescan_prepare(struct pci_bus *bus)
> +{
> +	struct pci_dev *dev;
> +
> +	if (bus->self)
> +		pci_config_pm_runtime_get(bus->self);
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		struct pci_bus *child = dev->subordinate;
> +
> +		if (child)
> +			pci_bus_rescan_prepare(child);
> +
> +		if (dev->driver &&
> +		    dev->driver->rescan_prepare) {
> +			dev->driver->rescan_prepare(dev);
> +			pcibios_rescan_prepare(dev);
> +		} else if (pci_dev_movable_bars_supported(dev)) {

The connection between these two conditions is pretty complicated to
figure out.  Can you explain why you need both?

> +			pcibios_rescan_prepare(dev);
> +		}
> +	}
> +}
> +
> +static void pci_bus_rescan_done(struct pci_bus *bus)
> +{
> +	struct pci_dev *dev;
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		struct pci_bus *child = dev->subordinate;
> +
> +		if (dev->driver &&
> +		    dev->driver->rescan_done) {
> +			pcibios_rescan_done(dev);
> +			dev->driver->rescan_done(dev);
> +		} else if (pci_dev_movable_bars_supported(dev)) {
> +			pcibios_rescan_done(dev);
> +		}
> +
> +		if (child)
> +			pci_bus_rescan_done(child);
> +	}
> +
> +	if (bus->self)
> +		pci_config_pm_runtime_put(bus->self);
> +}
> +
>  /**
>   * pci_rescan_bus - Scan a PCI bus for devices
>   * @bus: PCI bus to scan
> @@ -3418,9 +3486,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>  unsigned int pci_rescan_bus(struct pci_bus *bus)
>  {
>  	unsigned int max;
> +	struct pci_bus *root = bus;
> +
> +	while (!pci_is_root_bus(root))
> +		root = root->parent;
> +
> +	if (pci_movable_bars_enabled()) {
> +		pci_bus_rescan_prepare(root);
> +
> +		max = pci_scan_child_bus(root);
> +		pci_assign_unassigned_root_bus_resources(root);
> +
> +		pci_bus_rescan_done(root);
> +	} else {
> +		max = pci_scan_child_bus(bus);
> +		pci_assign_unassigned_bus_resources(bus);
> +	}
>  
> -	max = pci_scan_child_bus(bus);
> -	pci_assign_unassigned_bus_resources(bus);
>  	pci_bus_add_devices(bus);
>  
>  	return max;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index d3a72159722d..e5b5eff05744 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -838,6 +838,8 @@ struct pci_driver {
>  	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
>  	void (*shutdown)(struct pci_dev *dev);
>  	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
> +	void (*rescan_prepare)(struct pci_dev *dev);
> +	void (*rescan_done)(struct pci_dev *dev);
>  	const struct pci_error_handlers *err_handler;
>  	const struct attribute_group **groups;
>  	struct device_driver	driver;
> @@ -924,6 +926,7 @@ enum {
>  	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
>  	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
>  	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
> +	PCI_IMMOVABLE_BARS	= 0x00000080,	/* Disable runtime BAR reassign */

There are no uses of PCI_IMMOVABLE_BARS left at the end of the series,
so I'd rather not add it if we can avoid it.

>  };
>  
>  /* These external functions are only available when PCI support is enabled */
> @@ -1266,6 +1269,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus);
>  void pci_lock_rescan_remove(void);
>  void pci_unlock_rescan_remove(void);
>  
> +void pcibios_rescan_prepare(struct pci_dev *dev);
> +void pcibios_rescan_done(struct pci_dev *dev);
> +
>  /* Vital Product Data routines */
>  ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
>  ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
> @@ -1402,6 +1408,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>  void pci_setup_bridge(struct pci_bus *bus);
>  resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>  					 unsigned long type);
> +bool pci_movable_bars_enabled(void);

I would really like it if this were simply

  extern bool pci_no_movable_bars;

in drivers/pci/pci.h.  It would default to false since it's
uninitialized, and "pci=no_movable_bars" would set it to true.

We have similar "=off" and "=force" parameters for ASPM and other
things, and it makes the code really hard to analyze.

>  #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
>  #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
@ 2019-09-27 22:02     ` Bjorn Helgaas
  0 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2019-09-27 22:02 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: David Laight, Sam Bobroff, linux-pci, linux, Lukas Wunner,
	Oliver O'Halloran, Rajat Jain, linuxppc-dev

On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
> When hot-adding a device, the bridge may have windows not big enough (or
> fragmented too much) for newly requested BARs to fit in. And expanding
> these bridge windows may be impossible because blocked by "neighboring"
> BARs and bridge windows.
> 
> Still, it may be possible to allocate a memory region for new BARs with the
> following procedure:
> 
> 1) notify all the drivers which support movable BARs to pause and release
>    the BARs; the rest of the drivers are guaranteed that their devices will
>    not get BARs moved;
> 
> 2) release all the bridge windows except of root bridges;
> 
> 3) try to recalculate new bridge windows that will fit all the BAR types:
>    - fixed;
>    - immovable;
>    - movable;
>    - newly requested by hot-added devices;
> 
> 4) if the previous step fails, disable BARs for one of the hot-added
>    devices and retry from step 3;
> 
> 5) notify the drivers, so they remap BARs and resume.

You don't do the actual recalculation in *this* patch, but since you
mention the procedure here, are we confident that we never make things
worse?

It's possible that a hot-add will trigger this attempt to move things
around, and it's possible that we won't find space for the new device
even if we move things around.  But are we certain that every device
that worked *before* the hot-add will still work *afterwards*?

Much of the assignment was probably done by the BIOS using different
algorithms than Linux has, so I think there's some chance that the
BIOS did a better job and if we lose that BIOS assignment, we might
not be able to recreate it.

> This makes the prior reservation of memory by BIOS/bootloader/firmware not
> required anymore for the PCI hotplug.
> 
> Drivers indicate their support of movable BARs by implementing the new
> .rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
> device's activity must be paused during a rescan, and iounmap()+ioremap()
> must be applied to every used BAR.
> 
> The platform also may need to prepare to BAR movement, so new hooks added:
> pcibios_rescan_prepare(pci_dev) and pcibios_rescan_prepare(pci_dev).
> 
> This patch is a preparation for future patches with actual implementation,
> and for now it just does the following:
>  - declares the feature;
>  - defines pci_movable_bars_enabled(), pci_dev_movable_bars_supported(dev);
>  - invokes the .rescan_prepare() and .rescan_done() driver notifiers;
>  - declares and invokes the pcibios_rescan_prepare()/_done() hooks;
>  - adds the PCI_IMMOVABLE_BARS flag.
> 
> The feature is disabled by default (via PCI_IMMOVABLE_BARS) until the final
> patch of the series. It can be overridden per-arch using this flag or by
> the following command line option:
> 
>     pcie_movable_bars={ off | force }
> 
> CC: Sam Bobroff <sbobroff@linux.ibm.com>
> CC: Rajat Jain <rajatja@google.com>
> CC: Lukas Wunner <lukas@wunner.de>
> CC: Oliver O'Halloran <oohall@gmail.com>
> CC: David Laight <David.Laight@ACULAB.COM>
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  7 ++
>  drivers/pci/pci-driver.c                      |  2 +
>  drivers/pci/pci.c                             | 24 ++++++
>  drivers/pci/pci.h                             |  2 +
>  drivers/pci/probe.c                           | 86 ++++++++++++++++++-
>  include/linux/pci.h                           |  7 ++
>  6 files changed, 126 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 47d981a86e2f..e2274ee87a35 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3526,6 +3526,13 @@
>  		nomsi	Do not use MSI for native PCIe PME signaling (this makes
>  			all PCIe root ports use INTx for all services).
>  
> +	pcie_movable_bars=[PCIE]

This isn't a PCIe-specific feature, it's just a function of whether
drivers are smart enough, so we shouldn't tie it specifically to PCIe.
We could eventually do this for conventional PCI as well.

> +			Override the movable BARs support detection:
> +		off
> +			Disable even if supported by the platform
> +		force
> +			Enable even if not explicitly declared as supported

What's the need for "force"?  If it's possible, I think we should
enable this functionality all the time and just have a disable switch
in case we trip over cases where it doesn't work, e.g., something
like:

  pci=no_movable_bars

>  	pcmv=		[HW,PCMCIA] BadgePAD 4
>  
>  	pd_ignore_unused
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..d11909e79263 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1688,6 +1688,8 @@ static int __init pci_driver_init(void)
>  {
>  	int ret;
>  
> +	pci_add_flags(PCI_IMMOVABLE_BARS);
> +
>  	ret = bus_register(&pci_bus_type);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 61d951766087..3a504f58ac60 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
>  }
>  __setup("pcie_port_pm=", pcie_port_pm_setup);
>  
> +static bool pcie_movable_bars_off;
> +static bool pcie_movable_bars_force;
> +static int __init pcie_movable_bars_setup(char *str)
> +{
> +	if (!strcmp(str, "off"))
> +		pcie_movable_bars_off = true;
> +	else if (!strcmp(str, "force"))
> +		pcie_movable_bars_force = true;
> +	return 1;
> +}
> +__setup("pcie_movable_bars=", pcie_movable_bars_setup);
> +
> +bool pci_movable_bars_enabled(void)
> +{
> +	if (pcie_movable_bars_off)
> +		return false;
> +
> +	if (pcie_movable_bars_force)
> +		return true;
> +
> +	return !pci_has_flag(PCI_IMMOVABLE_BARS);
> +}
> +EXPORT_SYMBOL(pci_movable_bars_enabled);

I don't think this needs to be exported, since the only references I
see are from the PCI core.

>  /* Time to wait after a reset for device to become responsive */
>  #define PCIE_RESET_READY_POLL_MS 60000
>  
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index d22d1b807701..be7acc477c64 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -257,6 +257,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
>  void pci_reassigndev_resource_alignment(struct pci_dev *dev);
>  void pci_disable_bridge_window(struct pci_dev *dev);
>  
> +bool pci_dev_movable_bars_supported(struct pci_dev *dev);
> +
>  /* PCIe link information */
>  #define PCIE_SPEED2STR(speed) \
>  	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 2e58ece820e8..60e3b48d2251 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3406,6 +3406,74 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>  	return max;
>  }
>  
> +bool pci_dev_movable_bars_supported(struct pci_dev *dev)

This name suggests that movable BARs is a property of the device, but
it's mostly a property of the *driver*.  Most uses are in conjuction
with checking IORESOURCE_PCI_FIXED for some resource, so I think this
might read more naturally and simplify the callers slightly:

  bool pci_dev_movable(struct pci_dev *dev)
  {
    if (dev->driver && dev->driver->rescan_prepare)
      return true;

    if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
      return false;

    if (!dev->driver)
      return true;

    return false;
  }

  bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
  {
    if (res->flags & IORESOURCE_PCI_FIXED)
      return false;

    return pci_dev_movable(dev);
  }

I'm not sure why the PCI_CLASS_DISPLAY_VGA special case is there; can
you add a comment about why that's needed?  Obviously we can't move
the 0xa0000 legacy frame buffer because I think devices are allowed to
claim that region even if no BAR describes it.  But I would think
*other* BARs of VGA devices could be movable.

> +{
> +	if (!dev)
> +		return false;
> +
> +	if (dev->driver && dev->driver->rescan_prepare)
> +		return true;
> +
> +	if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
> +		return false;
> +
> +	return !dev->driver;
> +}
> +
> +void __weak pcibios_rescan_prepare(struct pci_dev *dev)
> +{
> +}
> +
> +void __weak pcibios_rescan_done(struct pci_dev *dev)
> +{
> +}

Can you add the pcibios_rescan_prepare() and pcibios_rescan_done()
stubs at the point where they're needed, i.e., where you add the
powerpc implementations?  We can't see the need for them at this point
in the series.

> +static void pci_bus_rescan_prepare(struct pci_bus *bus)
> +{
> +	struct pci_dev *dev;
> +
> +	if (bus->self)
> +		pci_config_pm_runtime_get(bus->self);
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		struct pci_bus *child = dev->subordinate;
> +
> +		if (child)
> +			pci_bus_rescan_prepare(child);
> +
> +		if (dev->driver &&
> +		    dev->driver->rescan_prepare) {
> +			dev->driver->rescan_prepare(dev);
> +			pcibios_rescan_prepare(dev);
> +		} else if (pci_dev_movable_bars_supported(dev)) {

The connection between these two conditions is pretty complicated to
figure out.  Can you explain why you need both?

> +			pcibios_rescan_prepare(dev);
> +		}
> +	}
> +}
> +
> +static void pci_bus_rescan_done(struct pci_bus *bus)
> +{
> +	struct pci_dev *dev;
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		struct pci_bus *child = dev->subordinate;
> +
> +		if (dev->driver &&
> +		    dev->driver->rescan_done) {
> +			pcibios_rescan_done(dev);
> +			dev->driver->rescan_done(dev);
> +		} else if (pci_dev_movable_bars_supported(dev)) {
> +			pcibios_rescan_done(dev);
> +		}
> +
> +		if (child)
> +			pci_bus_rescan_done(child);
> +	}
> +
> +	if (bus->self)
> +		pci_config_pm_runtime_put(bus->self);
> +}
> +
>  /**
>   * pci_rescan_bus - Scan a PCI bus for devices
>   * @bus: PCI bus to scan
> @@ -3418,9 +3486,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>  unsigned int pci_rescan_bus(struct pci_bus *bus)
>  {
>  	unsigned int max;
> +	struct pci_bus *root = bus;
> +
> +	while (!pci_is_root_bus(root))
> +		root = root->parent;
> +
> +	if (pci_movable_bars_enabled()) {
> +		pci_bus_rescan_prepare(root);
> +
> +		max = pci_scan_child_bus(root);
> +		pci_assign_unassigned_root_bus_resources(root);
> +
> +		pci_bus_rescan_done(root);
> +	} else {
> +		max = pci_scan_child_bus(bus);
> +		pci_assign_unassigned_bus_resources(bus);
> +	}
>  
> -	max = pci_scan_child_bus(bus);
> -	pci_assign_unassigned_bus_resources(bus);
>  	pci_bus_add_devices(bus);
>  
>  	return max;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index d3a72159722d..e5b5eff05744 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -838,6 +838,8 @@ struct pci_driver {
>  	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
>  	void (*shutdown)(struct pci_dev *dev);
>  	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
> +	void (*rescan_prepare)(struct pci_dev *dev);
> +	void (*rescan_done)(struct pci_dev *dev);
>  	const struct pci_error_handlers *err_handler;
>  	const struct attribute_group **groups;
>  	struct device_driver	driver;
> @@ -924,6 +926,7 @@ enum {
>  	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
>  	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
>  	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
> +	PCI_IMMOVABLE_BARS	= 0x00000080,	/* Disable runtime BAR reassign */

There are no uses of PCI_IMMOVABLE_BARS left at the end of the series,
so I'd rather not add it if we can avoid it.

>  };
>  
>  /* These external functions are only available when PCI support is enabled */
> @@ -1266,6 +1269,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus);
>  void pci_lock_rescan_remove(void);
>  void pci_unlock_rescan_remove(void);
>  
> +void pcibios_rescan_prepare(struct pci_dev *dev);
> +void pcibios_rescan_done(struct pci_dev *dev);
> +
>  /* Vital Product Data routines */
>  ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
>  ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
> @@ -1402,6 +1408,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>  void pci_setup_bridge(struct pci_bus *bus);
>  resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>  					 unsigned long type);
> +bool pci_movable_bars_enabled(void);

I would really like it if this were simply

  extern bool pci_no_movable_bars;

in drivers/pci/pci.h.  It would default to false since it's
uninitialized, and "pci=no_movable_bars" would set it to true.

We have similar "=off" and "=force" parameters for ASPM and other
things, and it makes the code really hard to analyze.

>  #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
>  #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
  2019-09-27 22:02     ` Bjorn Helgaas
  (?)
@ 2019-09-30  8:44     ` David Laight
  2019-09-30 16:17       ` Sergey Miroshnichenko
  -1 siblings, 1 reply; 78+ messages in thread
From: David Laight @ 2019-09-30  8:44 UTC (permalink / raw)
  To: 'Bjorn Helgaas', Sergey Miroshnichenko
  Cc: Sam Bobroff, linux-pci, linux, Lukas Wunner,
	Oliver O'Halloran, Rajat Jain, linuxppc-dev

From: Bjorn Helgaas
> Sent: 27 September 2019 23:02
> On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
> > When hot-adding a device, the bridge may have windows not big enough (or
> > fragmented too much) for newly requested BARs to fit in. And expanding
> > these bridge windows may be impossible because blocked by "neighboring"
> > BARs and bridge windows.
> >
> > Still, it may be possible to allocate a memory region for new BARs with the
> > following procedure:
> >
> > 1) notify all the drivers which support movable BARs to pause and release
> >    the BARs; the rest of the drivers are guaranteed that their devices will
> >    not get BARs moved;
> >
> > 2) release all the bridge windows except of root bridges;
> >
> > 3) try to recalculate new bridge windows that will fit all the BAR types:
> >    - fixed;
> >    - immovable;
> >    - movable;
> >    - newly requested by hot-added devices;
> >
> > 4) if the previous step fails, disable BARs for one of the hot-added
> >    devices and retry from step 3;
> >
> > 5) notify the drivers, so they remap BARs and resume.
> 
> You don't do the actual recalculation in *this* patch, but since you
> mention the procedure here, are we confident that we never make things
> worse?
> 
> It's possible that a hot-add will trigger this attempt to move things
> around, and it's possible that we won't find space for the new device
> even if we move things around.  But are we certain that every device
> that worked *before* the hot-add will still work *afterwards*?
> 
> Much of the assignment was probably done by the BIOS using different
> algorithms than Linux has, so I think there's some chance that the
> BIOS did a better job and if we lose that BIOS assignment, we might
> not be able to recreate it.

Yep, removing everything and starting again is probably OTT and most of the churn won't help.

I think you need to work out what can be moved in order to make the required resources available
to each bus and then make the required changes.

In the simplest case you are trying to add resource below a bridge so need to 'shuffle'
everything allocated after that bridge to later addresses (etc).

Many devices that support address reassignment might not need to be moved - so there is
no point remmapping them.

There is also the case when a device that is present but not currently is use could be taken
through a remove+insert sequence in order to change its resources.
Much easier to implement than 'remap while active'.
This would require a call into the driver (than can sleep) to request whether it is idle.
(and probably one at the end if the remove wasn't done).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
  2019-09-27 21:59     ` Bjorn Helgaas
@ 2019-09-30  8:53       ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-09-30  8:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linuxppc-dev, linux, Srinath Mannam, Marta Rybczynska

Hello Bjorn,

On 9/28/19 12:59 AM, Bjorn Helgaas wrote:
> On Fri, Aug 16, 2019 at 07:50:39PM +0300, Sergey Miroshnichenko wrote:
>> This is a yet another approach to fix an old [1-2] concurrency issue, when:
>>   - two or more devices are being hot-added into a bridge which was
>>     initially empty;
>>   - a bridge with two or more devices is being hot-added;
>>   - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
>>
>> The problem is that a bridge is reported as enabled before the MEM/IO bits
>> are actually written to the PCI_COMMAND register, so another driver thread
>> starts memory requests through the not-yet-enabled bridge:
>>
>>   CPU0                                        CPU1
>>
>>   pci_enable_device_mem()                     pci_enable_device_mem()
>>     pci_enable_bridge()                         pci_enable_bridge()
>>       pci_is_enabled()
>>         return false;
>>       atomic_inc_return(enable_cnt)
>>       Start actual enabling the bridge
>>       ...                                         pci_is_enabled()
>>       ...                                           return true;
>>       ...                                     Start memory requests <-- FAIL
>>       ...
>>       Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
>>
>> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
>> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
>> while enabling upstream bridges"), but adding a per-device mutexes and
>> preventing the dev->enable_cnt from from incrementing early.
> 
> This isn't directly related to the movable BARs functionality; is it
> here because you see the problem more frequently when moving BARs?
> 

First two patches of this series (including this one) are fixes for
the boot and for the hotplug, not related to movable BARs.

Before these fixes, we were suffering from this issue on PowerNV until
commit db2173198b9513f7add8009f225afa1f1c79bcc6 "powerpc/powernv/pci:
Work around races in PCI bridge enabling" was backported to distros:
NVMEs randomly failed to start during system boot. So we've tested the
fixes with that commit reverted.

On x86 the BIOS does pre-enable the bridges, but they were still prone
to races when hot-added or was initially "empty".

Serge

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
@ 2019-09-30  8:53       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-09-30  8:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marta Rybczynska, linux-pci, Srinath Mannam, linuxppc-dev, linux

Hello Bjorn,

On 9/28/19 12:59 AM, Bjorn Helgaas wrote:
> On Fri, Aug 16, 2019 at 07:50:39PM +0300, Sergey Miroshnichenko wrote:
>> This is a yet another approach to fix an old [1-2] concurrency issue, when:
>>   - two or more devices are being hot-added into a bridge which was
>>     initially empty;
>>   - a bridge with two or more devices is being hot-added;
>>   - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
>>
>> The problem is that a bridge is reported as enabled before the MEM/IO bits
>> are actually written to the PCI_COMMAND register, so another driver thread
>> starts memory requests through the not-yet-enabled bridge:
>>
>>   CPU0                                        CPU1
>>
>>   pci_enable_device_mem()                     pci_enable_device_mem()
>>     pci_enable_bridge()                         pci_enable_bridge()
>>       pci_is_enabled()
>>         return false;
>>       atomic_inc_return(enable_cnt)
>>       Start actual enabling the bridge
>>       ...                                         pci_is_enabled()
>>       ...                                           return true;
>>       ...                                     Start memory requests <-- FAIL
>>       ...
>>       Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
>>
>> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
>> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
>> while enabling upstream bridges"), but adding a per-device mutexes and
>> preventing the dev->enable_cnt from from incrementing early.
> 
> This isn't directly related to the movable BARs functionality; is it
> here because you see the problem more frequently when moving BARs?
> 

First two patches of this series (including this one) are fixes for
the boot and for the hotplug, not related to movable BARs.

Before these fixes, we were suffering from this issue on PowerNV until
commit db2173198b9513f7add8009f225afa1f1c79bcc6 "powerpc/powernv/pci:
Work around races in PCI bridge enabling" was backported to distros:
NVMEs randomly failed to start during system boot. So we've tested the
fixes with that commit reverted.

On x86 the BIOS does pre-enable the bridges, but they were still prone
to races when hot-added or was initially "empty".

Serge

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
  2019-09-27 22:02     ` Bjorn Helgaas
@ 2019-09-30 12:59       ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-09-30 12:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linuxppc-dev, linux, Sam Bobroff, Rajat Jain,
	Lukas Wunner, Oliver O'Halloran, David Laight

Hello Bjorn,

On 9/28/19 1:02 AM, Bjorn Helgaas wrote:
> On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
>> When hot-adding a device, the bridge may have windows not big enough (or
>> fragmented too much) for newly requested BARs to fit in. And expanding
>> these bridge windows may be impossible because blocked by "neighboring"
>> BARs and bridge windows.
>>
>> Still, it may be possible to allocate a memory region for new BARs with the
>> following procedure:
>>
>> 1) notify all the drivers which support movable BARs to pause and release
>>     the BARs; the rest of the drivers are guaranteed that their devices will
>>     not get BARs moved;
>>
>> 2) release all the bridge windows except of root bridges;
>>
>> 3) try to recalculate new bridge windows that will fit all the BAR types:
>>     - fixed;
>>     - immovable;
>>     - movable;
>>     - newly requested by hot-added devices;
>>
>> 4) if the previous step fails, disable BARs for one of the hot-added
>>     devices and retry from step 3;
>>
>> 5) notify the drivers, so they remap BARs and resume.
> 
> You don't do the actual recalculation in *this* patch, but since you
> mention the procedure here, are we confident that we never make things
> worse?
> 
> It's possible that a hot-add will trigger this attempt to move things
> around, and it's possible that we won't find space for the new device
> even if we move things around.  But are we certain that every device
> that worked *before* the hot-add will still work *afterwards*?
> 
> Much of the assignment was probably done by the BIOS using different
> algorithms than Linux has, so I think there's some chance that the
> BIOS did a better job and if we lose that BIOS assignment, we might
> not be able to recreate it.
> 

If a hardware has some special constraints on BAR assignment that the
kernel is not aware of yet, the movable BARs may break things after a
hotplug event. So the feature must be disabled there (manually) until
the kernel get support for that special needs.

On x86 we had no choice - most of the machines we used just can't boot
with even an "empty" 16-port switch connected. So we hot-add it after
the boot, then trigger a rescan via 'echo 1 > /sys/bus/pci/rescan'.
And reserved bridge windows wasn't enough, and they can't expand
because are blocked by the next device.

>> This makes the prior reservation of memory by BIOS/bootloader/firmware not
>> required anymore for the PCI hotplug.
>>
>> Drivers indicate their support of movable BARs by implementing the new
>> .rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
>> device's activity must be paused during a rescan, and iounmap()+ioremap()
>> must be applied to every used BAR.
>>
>> The platform also may need to prepare to BAR movement, so new hooks added:
>> pcibios_rescan_prepare(pci_dev) and pcibios_rescan_prepare(pci_dev).
>>
>> This patch is a preparation for future patches with actual implementation,
>> and for now it just does the following:
>>   - declares the feature;
>>   - defines pci_movable_bars_enabled(), pci_dev_movable_bars_supported(dev);
>>   - invokes the .rescan_prepare() and .rescan_done() driver notifiers;
>>   - declares and invokes the pcibios_rescan_prepare()/_done() hooks;
>>   - adds the PCI_IMMOVABLE_BARS flag.
>>
>> The feature is disabled by default (via PCI_IMMOVABLE_BARS) until the final
>> patch of the series. It can be overridden per-arch using this flag or by
>> the following command line option:
>>
>>      pcie_movable_bars={ off | force }
>>
>> CC: Sam Bobroff <sbobroff@linux.ibm.com>
>> CC: Rajat Jain <rajatja@google.com>
>> CC: Lukas Wunner <lukas@wunner.de>
>> CC: Oliver O'Halloran <oohall@gmail.com>
>> CC: David Laight <David.Laight@ACULAB.COM>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>   .../admin-guide/kernel-parameters.txt         |  7 ++
>>   drivers/pci/pci-driver.c                      |  2 +
>>   drivers/pci/pci.c                             | 24 ++++++
>>   drivers/pci/pci.h                             |  2 +
>>   drivers/pci/probe.c                           | 86 ++++++++++++++++++-
>>   include/linux/pci.h                           |  7 ++
>>   6 files changed, 126 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 47d981a86e2f..e2274ee87a35 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3526,6 +3526,13 @@
>>   		nomsi	Do not use MSI for native PCIe PME signaling (this makes
>>   			all PCIe root ports use INTx for all services).
>>   
>> +	pcie_movable_bars=[PCIE]
> 
> This isn't a PCIe-specific feature, it's just a function of whether
> drivers are smart enough, so we shouldn't tie it specifically to PCIe.
> We could eventually do this for conventional PCI as well.
> 
>> +			Override the movable BARs support detection:
>> +		off
>> +			Disable even if supported by the platform
>> +		force
>> +			Enable even if not explicitly declared as supported
> 
> What's the need for "force"?  If it's possible, I think we should
> enable this functionality all the time and just have a disable switch
> in case we trip over cases where it doesn't work, e.g., something
> like:
> 
>    pci=no_movable_bars
> 

Thanks, I'll simplify that, replace the pci_movable_bars_enabled()
with a bool variable, and remove the flag as you advice.

>>   	pcmv=		[HW,PCMCIA] BadgePAD 4
>>   
>>   	pd_ignore_unused
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index a8124e47bf6e..d11909e79263 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -1688,6 +1688,8 @@ static int __init pci_driver_init(void)
>>   {
>>   	int ret;
>>   
>> +	pci_add_flags(PCI_IMMOVABLE_BARS);
>> +
>>   	ret = bus_register(&pci_bus_type);
>>   	if (ret)
>>   		return ret;
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 61d951766087..3a504f58ac60 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
>>   }
>>   __setup("pcie_port_pm=", pcie_port_pm_setup);
>>   
>> +static bool pcie_movable_bars_off;
>> +static bool pcie_movable_bars_force;
>> +static int __init pcie_movable_bars_setup(char *str)
>> +{
>> +	if (!strcmp(str, "off"))
>> +		pcie_movable_bars_off = true;
>> +	else if (!strcmp(str, "force"))
>> +		pcie_movable_bars_force = true;
>> +	return 1;
>> +}
>> +__setup("pcie_movable_bars=", pcie_movable_bars_setup);
>> +
>> +bool pci_movable_bars_enabled(void)
>> +{
>> +	if (pcie_movable_bars_off)
>> +		return false;
>> +
>> +	if (pcie_movable_bars_force)
>> +		return true;
>> +
>> +	return !pci_has_flag(PCI_IMMOVABLE_BARS);
>> +}
>> +EXPORT_SYMBOL(pci_movable_bars_enabled);
> 
> I don't think this needs to be exported, since the only references I
> see are from the PCI core.
> 

Thanks for catching that, this export slipped from older patches, when
the function was used in arch/*

>>   /* Time to wait after a reset for device to become responsive */
>>   #define PCIE_RESET_READY_POLL_MS 60000
>>   
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index d22d1b807701..be7acc477c64 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -257,6 +257,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
>>   void pci_reassigndev_resource_alignment(struct pci_dev *dev);
>>   void pci_disable_bridge_window(struct pci_dev *dev);
>>   
>> +bool pci_dev_movable_bars_supported(struct pci_dev *dev);
>> +
>>   /* PCIe link information */
>>   #define PCIE_SPEED2STR(speed) \
>>   	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 2e58ece820e8..60e3b48d2251 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -3406,6 +3406,74 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>>   	return max;
>>   }
>>   
>> +bool pci_dev_movable_bars_supported(struct pci_dev *dev)
> 
> This name suggests that movable BARs is a property of the device, but
> it's mostly a property of the *driver*.  Most uses are in conjuction
> with checking IORESOURCE_PCI_FIXED for some resource, so I think this
> might read more naturally and simplify the callers slightly:
> 
>    bool pci_dev_movable(struct pci_dev *dev)
>    {
>      if (dev->driver && dev->driver->rescan_prepare)
>        return true;
> 
>      if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
>        return false;
> 
>      if (!dev->driver)
>        return true;
> 
>      return false;
>    }
> 
>    bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
>    {
>      if (res->flags & IORESOURCE_PCI_FIXED)
>        return false;
> 
>      return pci_dev_movable(dev);
>    }
> 

Nice, I'll do that, thanks.

Theoretically, there may be a number of identical devices in the
system, some of them are bound to an "immovable" driver, and the rest
is not, which makes them movable. Also modprobe+rmmod may affect
device's BARs mobility. But I agree, the "_supported" suffix was
confusing.

> I'm not sure why the PCI_CLASS_DISPLAY_VGA special case is there; can
> you add a comment about why that's needed?  Obviously we can't move
> the 0xa0000 legacy frame buffer because I think devices are allowed to
> claim that region even if no BAR describes it.  But I would think
> *other* BARs of VGA devices could be movable.
> 

Sure, I'll add a comment to the code.

The issue that we are avoiding by that is the "nomodeset" command line
argument, which prevents a video driver from being bound, so the BARs
are seems to be used, but can't be moved, otherwise machines just hang
after hotplug events. That was the only special ugly case we've
spotted during testing. I'll check if it will be enough just to work
around the 0xa0000.

>> +{
>> +	if (!dev)
>> +		return false;
>> +
>> +	if (dev->driver && dev->driver->rescan_prepare)
>> +		return true;
>> +
>> +	if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
>> +		return false;
>> +
>> +	return !dev->driver;
>> +}
>> +
>> +void __weak pcibios_rescan_prepare(struct pci_dev *dev)
>> +{
>> +}
>> +
>> +void __weak pcibios_rescan_done(struct pci_dev *dev)
>> +{
>> +}
> 
> Can you add the pcibios_rescan_prepare() and pcibios_rescan_done()
> stubs at the point where they're needed, i.e., where you add the
> powerpc implementations?  We can't see the need for them at this point
> in the series.
> 

Sure.

>> +static void pci_bus_rescan_prepare(struct pci_bus *bus)
>> +{
>> +	struct pci_dev *dev;
>> +
>> +	if (bus->self)
>> +		pci_config_pm_runtime_get(bus->self);
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child = dev->subordinate;
>> +
>> +		if (child)
>> +			pci_bus_rescan_prepare(child);
>> +
>> +		if (dev->driver &&
>> +		    dev->driver->rescan_prepare) {
>> +			dev->driver->rescan_prepare(dev);
>> +			pcibios_rescan_prepare(dev);
>> +		} else if (pci_dev_movable_bars_supported(dev)) {
> 
> The connection between these two conditions is pretty complicated to
> figure out.  Can you explain why you need both?
> 

Yeah, this is what was meant, if would be written clearer:

	if (pci_dev_movable_bars_supported(dev)) {
		if (dev->driver &&
		    dev->driver->rescan_prepare)
			dev->driver->rescan_prepare(dev);

		pcibios_rescan_prepare(dev);
	}

But anyway the pcibios_rescan_prepare(dev) was for PowerNV, and after
consulting with Oliver it was decided to make that a single call for
the whole domain instead of per-device.

>> +			pcibios_rescan_prepare(dev);
>> +		}
>> +	}
>> +}
>> +
>> +static void pci_bus_rescan_done(struct pci_bus *bus)
>> +{
>> +	struct pci_dev *dev;
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child = dev->subordinate;
>> +
>> +		if (dev->driver &&
>> +		    dev->driver->rescan_done) {
>> +			pcibios_rescan_done(dev);
>> +			dev->driver->rescan_done(dev);
>> +		} else if (pci_dev_movable_bars_supported(dev)) {
>> +			pcibios_rescan_done(dev);
>> +		}
>> +
>> +		if (child)
>> +			pci_bus_rescan_done(child);
>> +	}
>> +
>> +	if (bus->self)
>> +		pci_config_pm_runtime_put(bus->self);
>> +}
>> +
>>   /**
>>    * pci_rescan_bus - Scan a PCI bus for devices
>>    * @bus: PCI bus to scan
>> @@ -3418,9 +3486,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>>   unsigned int pci_rescan_bus(struct pci_bus *bus)
>>   {
>>   	unsigned int max;
>> +	struct pci_bus *root = bus;
>> +
>> +	while (!pci_is_root_bus(root))
>> +		root = root->parent;
>> +
>> +	if (pci_movable_bars_enabled()) {
>> +		pci_bus_rescan_prepare(root);
>> +
>> +		max = pci_scan_child_bus(root);
>> +		pci_assign_unassigned_root_bus_resources(root);
>> +
>> +		pci_bus_rescan_done(root);
>> +	} else {
>> +		max = pci_scan_child_bus(bus);
>> +		pci_assign_unassigned_bus_resources(bus);
>> +	}
>>   
>> -	max = pci_scan_child_bus(bus);
>> -	pci_assign_unassigned_bus_resources(bus);
>>   	pci_bus_add_devices(bus);
>>   
>>   	return max;
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index d3a72159722d..e5b5eff05744 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -838,6 +838,8 @@ struct pci_driver {
>>   	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
>>   	void (*shutdown)(struct pci_dev *dev);
>>   	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
>> +	void (*rescan_prepare)(struct pci_dev *dev);
>> +	void (*rescan_done)(struct pci_dev *dev);
>>   	const struct pci_error_handlers *err_handler;
>>   	const struct attribute_group **groups;
>>   	struct device_driver	driver;
>> @@ -924,6 +926,7 @@ enum {
>>   	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
>>   	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
>>   	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
>> +	PCI_IMMOVABLE_BARS	= 0x00000080,	/* Disable runtime BAR reassign */
> 
> There are no uses of PCI_IMMOVABLE_BARS left at the end of the series,
> so I'd rather not add it if we can avoid it.
> 
>>   };
>>   
>>   /* These external functions are only available when PCI support is enabled */
>> @@ -1266,6 +1269,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus);
>>   void pci_lock_rescan_remove(void);
>>   void pci_unlock_rescan_remove(void);
>>   
>> +void pcibios_rescan_prepare(struct pci_dev *dev);
>> +void pcibios_rescan_done(struct pci_dev *dev);
>> +
>>   /* Vital Product Data routines */
>>   ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
>>   ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
>> @@ -1402,6 +1408,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>>   void pci_setup_bridge(struct pci_bus *bus);
>>   resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>>   					 unsigned long type);
>> +bool pci_movable_bars_enabled(void);
> 
> I would really like it if this were simply
> 
>    extern bool pci_no_movable_bars;
> 
> in drivers/pci/pci.h.  It would default to false since it's
> uninitialized, and "pci=no_movable_bars" would set it to true.
> 

I have a premonition of platforms that will not support the feature.
Wouldn't be better to put this variable-flag to include/linux/pci.h ,
so code in arch/* can set it, so they could work by default, without
the command line argument?

Serge

> We have similar "=off" and "=force" parameters for ASPM and other
> things, and it makes the code really hard to analyze.
> 
>>   #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
>>   #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
>> -- 
>> 2.21.0
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
@ 2019-09-30 12:59       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-09-30 12:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: David Laight, Sam Bobroff, linux-pci, linux, Lukas Wunner,
	Oliver O'Halloran, Rajat Jain, linuxppc-dev

Hello Bjorn,

On 9/28/19 1:02 AM, Bjorn Helgaas wrote:
> On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
>> When hot-adding a device, the bridge may have windows not big enough (or
>> fragmented too much) for newly requested BARs to fit in. And expanding
>> these bridge windows may be impossible because blocked by "neighboring"
>> BARs and bridge windows.
>>
>> Still, it may be possible to allocate a memory region for new BARs with the
>> following procedure:
>>
>> 1) notify all the drivers which support movable BARs to pause and release
>>     the BARs; the rest of the drivers are guaranteed that their devices will
>>     not get BARs moved;
>>
>> 2) release all the bridge windows except of root bridges;
>>
>> 3) try to recalculate new bridge windows that will fit all the BAR types:
>>     - fixed;
>>     - immovable;
>>     - movable;
>>     - newly requested by hot-added devices;
>>
>> 4) if the previous step fails, disable BARs for one of the hot-added
>>     devices and retry from step 3;
>>
>> 5) notify the drivers, so they remap BARs and resume.
> 
> You don't do the actual recalculation in *this* patch, but since you
> mention the procedure here, are we confident that we never make things
> worse?
> 
> It's possible that a hot-add will trigger this attempt to move things
> around, and it's possible that we won't find space for the new device
> even if we move things around.  But are we certain that every device
> that worked *before* the hot-add will still work *afterwards*?
> 
> Much of the assignment was probably done by the BIOS using different
> algorithms than Linux has, so I think there's some chance that the
> BIOS did a better job and if we lose that BIOS assignment, we might
> not be able to recreate it.
> 

If a hardware has some special constraints on BAR assignment that the
kernel is not aware of yet, the movable BARs may break things after a
hotplug event. So the feature must be disabled there (manually) until
the kernel get support for that special needs.

On x86 we had no choice - most of the machines we used just can't boot
with even an "empty" 16-port switch connected. So we hot-add it after
the boot, then trigger a rescan via 'echo 1 > /sys/bus/pci/rescan'.
And reserved bridge windows wasn't enough, and they can't expand
because are blocked by the next device.

>> This makes the prior reservation of memory by BIOS/bootloader/firmware not
>> required anymore for the PCI hotplug.
>>
>> Drivers indicate their support of movable BARs by implementing the new
>> .rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
>> device's activity must be paused during a rescan, and iounmap()+ioremap()
>> must be applied to every used BAR.
>>
>> The platform also may need to prepare to BAR movement, so new hooks added:
>> pcibios_rescan_prepare(pci_dev) and pcibios_rescan_prepare(pci_dev).
>>
>> This patch is a preparation for future patches with actual implementation,
>> and for now it just does the following:
>>   - declares the feature;
>>   - defines pci_movable_bars_enabled(), pci_dev_movable_bars_supported(dev);
>>   - invokes the .rescan_prepare() and .rescan_done() driver notifiers;
>>   - declares and invokes the pcibios_rescan_prepare()/_done() hooks;
>>   - adds the PCI_IMMOVABLE_BARS flag.
>>
>> The feature is disabled by default (via PCI_IMMOVABLE_BARS) until the final
>> patch of the series. It can be overridden per-arch using this flag or by
>> the following command line option:
>>
>>      pcie_movable_bars={ off | force }
>>
>> CC: Sam Bobroff <sbobroff@linux.ibm.com>
>> CC: Rajat Jain <rajatja@google.com>
>> CC: Lukas Wunner <lukas@wunner.de>
>> CC: Oliver O'Halloran <oohall@gmail.com>
>> CC: David Laight <David.Laight@ACULAB.COM>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>   .../admin-guide/kernel-parameters.txt         |  7 ++
>>   drivers/pci/pci-driver.c                      |  2 +
>>   drivers/pci/pci.c                             | 24 ++++++
>>   drivers/pci/pci.h                             |  2 +
>>   drivers/pci/probe.c                           | 86 ++++++++++++++++++-
>>   include/linux/pci.h                           |  7 ++
>>   6 files changed, 126 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 47d981a86e2f..e2274ee87a35 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3526,6 +3526,13 @@
>>   		nomsi	Do not use MSI for native PCIe PME signaling (this makes
>>   			all PCIe root ports use INTx for all services).
>>   
>> +	pcie_movable_bars=[PCIE]
> 
> This isn't a PCIe-specific feature, it's just a function of whether
> drivers are smart enough, so we shouldn't tie it specifically to PCIe.
> We could eventually do this for conventional PCI as well.
> 
>> +			Override the movable BARs support detection:
>> +		off
>> +			Disable even if supported by the platform
>> +		force
>> +			Enable even if not explicitly declared as supported
> 
> What's the need for "force"?  If it's possible, I think we should
> enable this functionality all the time and just have a disable switch
> in case we trip over cases where it doesn't work, e.g., something
> like:
> 
>    pci=no_movable_bars
> 

Thanks, I'll simplify that, replace the pci_movable_bars_enabled()
with a bool variable, and remove the flag as you advice.

>>   	pcmv=		[HW,PCMCIA] BadgePAD 4
>>   
>>   	pd_ignore_unused
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index a8124e47bf6e..d11909e79263 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -1688,6 +1688,8 @@ static int __init pci_driver_init(void)
>>   {
>>   	int ret;
>>   
>> +	pci_add_flags(PCI_IMMOVABLE_BARS);
>> +
>>   	ret = bus_register(&pci_bus_type);
>>   	if (ret)
>>   		return ret;
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 61d951766087..3a504f58ac60 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
>>   }
>>   __setup("pcie_port_pm=", pcie_port_pm_setup);
>>   
>> +static bool pcie_movable_bars_off;
>> +static bool pcie_movable_bars_force;
>> +static int __init pcie_movable_bars_setup(char *str)
>> +{
>> +	if (!strcmp(str, "off"))
>> +		pcie_movable_bars_off = true;
>> +	else if (!strcmp(str, "force"))
>> +		pcie_movable_bars_force = true;
>> +	return 1;
>> +}
>> +__setup("pcie_movable_bars=", pcie_movable_bars_setup);
>> +
>> +bool pci_movable_bars_enabled(void)
>> +{
>> +	if (pcie_movable_bars_off)
>> +		return false;
>> +
>> +	if (pcie_movable_bars_force)
>> +		return true;
>> +
>> +	return !pci_has_flag(PCI_IMMOVABLE_BARS);
>> +}
>> +EXPORT_SYMBOL(pci_movable_bars_enabled);
> 
> I don't think this needs to be exported, since the only references I
> see are from the PCI core.
> 

Thanks for catching that, this export slipped from older patches, when
the function was used in arch/*

>>   /* Time to wait after a reset for device to become responsive */
>>   #define PCIE_RESET_READY_POLL_MS 60000
>>   
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index d22d1b807701..be7acc477c64 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -257,6 +257,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
>>   void pci_reassigndev_resource_alignment(struct pci_dev *dev);
>>   void pci_disable_bridge_window(struct pci_dev *dev);
>>   
>> +bool pci_dev_movable_bars_supported(struct pci_dev *dev);
>> +
>>   /* PCIe link information */
>>   #define PCIE_SPEED2STR(speed) \
>>   	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 2e58ece820e8..60e3b48d2251 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -3406,6 +3406,74 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>>   	return max;
>>   }
>>   
>> +bool pci_dev_movable_bars_supported(struct pci_dev *dev)
> 
> This name suggests that movable BARs is a property of the device, but
> it's mostly a property of the *driver*.  Most uses are in conjuction
> with checking IORESOURCE_PCI_FIXED for some resource, so I think this
> might read more naturally and simplify the callers slightly:
> 
>    bool pci_dev_movable(struct pci_dev *dev)
>    {
>      if (dev->driver && dev->driver->rescan_prepare)
>        return true;
> 
>      if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
>        return false;
> 
>      if (!dev->driver)
>        return true;
> 
>      return false;
>    }
> 
>    bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
>    {
>      if (res->flags & IORESOURCE_PCI_FIXED)
>        return false;
> 
>      return pci_dev_movable(dev);
>    }
> 

Nice, I'll do that, thanks.

Theoretically, there may be a number of identical devices in the
system, some of them are bound to an "immovable" driver, and the rest
is not, which makes them movable. Also modprobe+rmmod may affect
device's BARs mobility. But I agree, the "_supported" suffix was
confusing.

> I'm not sure why the PCI_CLASS_DISPLAY_VGA special case is there; can
> you add a comment about why that's needed?  Obviously we can't move
> the 0xa0000 legacy frame buffer because I think devices are allowed to
> claim that region even if no BAR describes it.  But I would think
> *other* BARs of VGA devices could be movable.
> 

Sure, I'll add a comment to the code.

The issue that we are avoiding by that is the "nomodeset" command line
argument, which prevents a video driver from being bound, so the BARs
are seems to be used, but can't be moved, otherwise machines just hang
after hotplug events. That was the only special ugly case we've
spotted during testing. I'll check if it will be enough just to work
around the 0xa0000.

>> +{
>> +	if (!dev)
>> +		return false;
>> +
>> +	if (dev->driver && dev->driver->rescan_prepare)
>> +		return true;
>> +
>> +	if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
>> +		return false;
>> +
>> +	return !dev->driver;
>> +}
>> +
>> +void __weak pcibios_rescan_prepare(struct pci_dev *dev)
>> +{
>> +}
>> +
>> +void __weak pcibios_rescan_done(struct pci_dev *dev)
>> +{
>> +}
> 
> Can you add the pcibios_rescan_prepare() and pcibios_rescan_done()
> stubs at the point where they're needed, i.e., where you add the
> powerpc implementations?  We can't see the need for them at this point
> in the series.
> 

Sure.

>> +static void pci_bus_rescan_prepare(struct pci_bus *bus)
>> +{
>> +	struct pci_dev *dev;
>> +
>> +	if (bus->self)
>> +		pci_config_pm_runtime_get(bus->self);
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child = dev->subordinate;
>> +
>> +		if (child)
>> +			pci_bus_rescan_prepare(child);
>> +
>> +		if (dev->driver &&
>> +		    dev->driver->rescan_prepare) {
>> +			dev->driver->rescan_prepare(dev);
>> +			pcibios_rescan_prepare(dev);
>> +		} else if (pci_dev_movable_bars_supported(dev)) {
> 
> The connection between these two conditions is pretty complicated to
> figure out.  Can you explain why you need both?
> 

Yeah, this is what was meant, if would be written clearer:

	if (pci_dev_movable_bars_supported(dev)) {
		if (dev->driver &&
		    dev->driver->rescan_prepare)
			dev->driver->rescan_prepare(dev);

		pcibios_rescan_prepare(dev);
	}

But anyway the pcibios_rescan_prepare(dev) was for PowerNV, and after
consulting with Oliver it was decided to make that a single call for
the whole domain instead of per-device.

>> +			pcibios_rescan_prepare(dev);
>> +		}
>> +	}
>> +}
>> +
>> +static void pci_bus_rescan_done(struct pci_bus *bus)
>> +{
>> +	struct pci_dev *dev;
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child = dev->subordinate;
>> +
>> +		if (dev->driver &&
>> +		    dev->driver->rescan_done) {
>> +			pcibios_rescan_done(dev);
>> +			dev->driver->rescan_done(dev);
>> +		} else if (pci_dev_movable_bars_supported(dev)) {
>> +			pcibios_rescan_done(dev);
>> +		}
>> +
>> +		if (child)
>> +			pci_bus_rescan_done(child);
>> +	}
>> +
>> +	if (bus->self)
>> +		pci_config_pm_runtime_put(bus->self);
>> +}
>> +
>>   /**
>>    * pci_rescan_bus - Scan a PCI bus for devices
>>    * @bus: PCI bus to scan
>> @@ -3418,9 +3486,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>>   unsigned int pci_rescan_bus(struct pci_bus *bus)
>>   {
>>   	unsigned int max;
>> +	struct pci_bus *root = bus;
>> +
>> +	while (!pci_is_root_bus(root))
>> +		root = root->parent;
>> +
>> +	if (pci_movable_bars_enabled()) {
>> +		pci_bus_rescan_prepare(root);
>> +
>> +		max = pci_scan_child_bus(root);
>> +		pci_assign_unassigned_root_bus_resources(root);
>> +
>> +		pci_bus_rescan_done(root);
>> +	} else {
>> +		max = pci_scan_child_bus(bus);
>> +		pci_assign_unassigned_bus_resources(bus);
>> +	}
>>   
>> -	max = pci_scan_child_bus(bus);
>> -	pci_assign_unassigned_bus_resources(bus);
>>   	pci_bus_add_devices(bus);
>>   
>>   	return max;
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index d3a72159722d..e5b5eff05744 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -838,6 +838,8 @@ struct pci_driver {
>>   	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
>>   	void (*shutdown)(struct pci_dev *dev);
>>   	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
>> +	void (*rescan_prepare)(struct pci_dev *dev);
>> +	void (*rescan_done)(struct pci_dev *dev);
>>   	const struct pci_error_handlers *err_handler;
>>   	const struct attribute_group **groups;
>>   	struct device_driver	driver;
>> @@ -924,6 +926,7 @@ enum {
>>   	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
>>   	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
>>   	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
>> +	PCI_IMMOVABLE_BARS	= 0x00000080,	/* Disable runtime BAR reassign */
> 
> There are no uses of PCI_IMMOVABLE_BARS left at the end of the series,
> so I'd rather not add it if we can avoid it.
> 
>>   };
>>   
>>   /* These external functions are only available when PCI support is enabled */
>> @@ -1266,6 +1269,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus);
>>   void pci_lock_rescan_remove(void);
>>   void pci_unlock_rescan_remove(void);
>>   
>> +void pcibios_rescan_prepare(struct pci_dev *dev);
>> +void pcibios_rescan_done(struct pci_dev *dev);
>> +
>>   /* Vital Product Data routines */
>>   ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
>>   ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
>> @@ -1402,6 +1408,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>>   void pci_setup_bridge(struct pci_bus *bus);
>>   resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>>   					 unsigned long type);
>> +bool pci_movable_bars_enabled(void);
> 
> I would really like it if this were simply
> 
>    extern bool pci_no_movable_bars;
> 
> in drivers/pci/pci.h.  It would default to false since it's
> uninitialized, and "pci=no_movable_bars" would set it to true.
> 

I have a premonition of platforms that will not support the feature.
Wouldn't be better to put this variable-flag to include/linux/pci.h ,
so code in arch/* can set it, so they could work by default, without
the command line argument?

Serge

> We have similar "=off" and "=force" parameters for ASPM and other
> things, and it makes the code really hard to analyze.
> 
>>   #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
>>   #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
>> -- 
>> 2.21.0
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
  2019-09-30  8:44     ` David Laight
@ 2019-09-30 16:17       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-09-30 16:17 UTC (permalink / raw)
  To: David Laight, 'Bjorn Helgaas'
  Cc: Sam Bobroff, linux-pci, linux, Lukas Wunner,
	Oliver O'Halloran, Rajat Jain, linuxppc-dev

Hello David,

On 9/30/19 11:44 AM, David Laight wrote:
> From: Bjorn Helgaas
>> Sent: 27 September 2019 23:02
>> On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
>>> When hot-adding a device, the bridge may have windows not big enough (or
>>> fragmented too much) for newly requested BARs to fit in. And expanding
>>> these bridge windows may be impossible because blocked by "neighboring"
>>> BARs and bridge windows.
>>>
>>> Still, it may be possible to allocate a memory region for new BARs with the
>>> following procedure:
>>>
>>> 1) notify all the drivers which support movable BARs to pause and release
>>>     the BARs; the rest of the drivers are guaranteed that their devices will
>>>     not get BARs moved;
>>>
>>> 2) release all the bridge windows except of root bridges;
>>>
>>> 3) try to recalculate new bridge windows that will fit all the BAR types:
>>>     - fixed;
>>>     - immovable;
>>>     - movable;
>>>     - newly requested by hot-added devices;
>>>
>>> 4) if the previous step fails, disable BARs for one of the hot-added
>>>     devices and retry from step 3;
>>>
>>> 5) notify the drivers, so they remap BARs and resume.
>>
>> You don't do the actual recalculation in *this* patch, but since you
>> mention the procedure here, are we confident that we never make things
>> worse?
>>
>> It's possible that a hot-add will trigger this attempt to move things
>> around, and it's possible that we won't find space for the new device
>> even if we move things around.  But are we certain that every device
>> that worked *before* the hot-add will still work *afterwards*?
>>
>> Much of the assignment was probably done by the BIOS using different
>> algorithms than Linux has, so I think there's some chance that the
>> BIOS did a better job and if we lose that BIOS assignment, we might
>> not be able to recreate it.
> 
> Yep, removing everything and starting again is probably OTT and most of the churn won't help.
> 
> I think you need to work out what can be moved in order to make the required resources available
> to each bus and then make the required changes.
> 
> In the simplest case you are trying to add resource below a bridge so need to 'shuffle'
> everything allocated after that bridge to later addresses (etc).
> 

Thank you for the review and suggestions!

But a bridge window may be fragmented: its total free space is enough
to fit everything, but no sufficient gaps for the new BARs. And this
bridge window may be jammed between two immovable/fixed BARs.

Or there may be lots of empty spaces in lower addresses after un-plugs,
but everything if fixed/immovable on higher addresses.

I've spent some time thinking on an optimization technique which can
be efficient enough (touch as few BARs as possible) with as high
success rate as calculating from scratch - and concluded that it is
not worth it: if only release the "obstructing" BARs and bridge
windows, a hotplug event will affect a half of (n+m) on average, which
is still O(n+m), where n is a number of endpoints, and m is a
number of bridges. But it's still need to resize windows of a root and
other common bridges.

Calculating bridge windows from scratch is relatively straightforward
and fast, so I have just added support for fixed/immovable BARs there
and reused.

> Many devices that support address reassignment might not need to be moved - so there is
> no point remmapping them.
> 

And it's the same algorithm that allocated BARs in first place, so it
will reassign the same BARs for the non-affected part of the topology.

> There is also the case when a device that is present but not currently is use could be taken
> through a remove+insert sequence in order to change its resources.
> Much easier to implement than 'remap while active'.
> This would require a call into the driver (than can sleep) to request whether it is idle.
> (and probably one at the end if the remove wasn't done).
> 

Unbind+rebind the "immovable" drivers of non-opened devices may
increase the probability of successful BAR allocation, but I'm afraid
this will produce some amount of false hotplug-like events in the logs.
Probably also some undesired effects like spikes in power consumption
because of driver initialization.

Best regards,
Serge

> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
  2019-09-30 12:59       ` Sergey Miroshnichenko
@ 2019-10-15 22:14         ` Bjorn Helgaas
  -1 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2019-10-15 22:14 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: linux-pci, linuxppc-dev, linux, Sam Bobroff, Rajat Jain,
	Lukas Wunner, Oliver O'Halloran, David Laight

On Mon, Sep 30, 2019 at 03:59:25PM +0300, Sergey Miroshnichenko wrote:
> Hello Bjorn,
> 
> On 9/28/19 1:02 AM, Bjorn Helgaas wrote:
> > On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
> > > When hot-adding a device, the bridge may have windows not big enough (or
> > > fragmented too much) for newly requested BARs to fit in. And expanding
> > > these bridge windows may be impossible because blocked by "neighboring"
> > > BARs and bridge windows.
> > > 
> > > Still, it may be possible to allocate a memory region for new BARs with the
> > > following procedure:
> > > 
> > > 1) notify all the drivers which support movable BARs to pause and release
> > >     the BARs; the rest of the drivers are guaranteed that their devices will
> > >     not get BARs moved;
> > > 
> > > 2) release all the bridge windows except of root bridges;
> > > 
> > > 3) try to recalculate new bridge windows that will fit all the BAR types:
> > >     - fixed;
> > >     - immovable;
> > >     - movable;
> > >     - newly requested by hot-added devices;
> > > 
> > > 4) if the previous step fails, disable BARs for one of the hot-added
> > >     devices and retry from step 3;
> > > 
> > > 5) notify the drivers, so they remap BARs and resume.
> > 
> > You don't do the actual recalculation in *this* patch, but since you
> > mention the procedure here, are we confident that we never make things
> > worse?
> > 
> > It's possible that a hot-add will trigger this attempt to move things
> > around, and it's possible that we won't find space for the new device
> > even if we move things around.  But are we certain that every device
> > that worked *before* the hot-add will still work *afterwards*?
> > 
> > Much of the assignment was probably done by the BIOS using different
> > algorithms than Linux has, so I think there's some chance that the
> > BIOS did a better job and if we lose that BIOS assignment, we might
> > not be able to recreate it.
> 
> If a hardware has some special constraints on BAR assignment that the
> kernel is not aware of yet, the movable BARs may break things after a
> hotplug event. So the feature must be disabled there (manually) until
> the kernel get support for that special needs.

I'm not talking about special constraints on BAR assignment.  (I'm not
sure what those constraints would be -- AFAIK the constraints for a
spec-compliant device are all discoverable via the BAR size and type
(or the Enhanced Allocation capability)).

What I'm concerned about is the case where we boot with a working
assignment, we hot-add a device, we move things around to try to
accommodate the new device, and not only do we fail to find resources
for the new device, we also fail to find a working assignment for the
devices that were present at boot.  We've moved things around from
what BIOS did, and since we use a different algorithm than the BIOS,
there's no guarantee that we'll be able to find the assignment BIOS
did.

> > I'm not sure why the PCI_CLASS_DISPLAY_VGA special case is there; can
> > you add a comment about why that's needed?  Obviously we can't move
> > the 0xa0000 legacy frame buffer because I think devices are allowed to
> > claim that region even if no BAR describes it.  But I would think
> > *other* BARs of VGA devices could be movable.
> 
> Sure, I'll add a comment to the code.
> 
> The issue that we are avoiding by that is the "nomodeset" command line
> argument, which prevents a video driver from being bound, so the BARs
> are seems to be used, but can't be moved, otherwise machines just hang
> after hotplug events. That was the only special ugly case we've
> spotted during testing. I'll check if it will be enough just to work
> around the 0xa0000.

"nomodeset" is not really documented and is a funny way to say "don't
bind video drivers that know about it", but OK.  Thanks for checking
on the other BARs.

> > > +bool pci_movable_bars_enabled(void);
> > 
> > I would really like it if this were simply
> > 
> >    extern bool pci_no_movable_bars;
> > 
> > in drivers/pci/pci.h.  It would default to false since it's
> > uninitialized, and "pci=no_movable_bars" would set it to true.
> 
> I have a premonition of platforms that will not support the feature.
> Wouldn't be better to put this variable-flag to include/linux/pci.h ,
> so code in arch/* can set it, so they could work by default, without
> the command line argument?

In general I don't see why a platform wouldn't support this since
there really isn't anything platform-specific here.  But if a platform
does need to disable it, having arch code set this flag sounds
reasonable.  We shouldn't make it globally visible until we actually
need that, though.

> > We have similar "=off" and "=force" parameters for ASPM and other
> > things, and it makes the code really hard to analyze.

The "=off" and "=force" things are the biggest things I'd like to
avoid.

Bjorn

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
@ 2019-10-15 22:14         ` Bjorn Helgaas
  0 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2019-10-15 22:14 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: David Laight, Sam Bobroff, linux-pci, linux, Lukas Wunner,
	Oliver O'Halloran, Rajat Jain, linuxppc-dev

On Mon, Sep 30, 2019 at 03:59:25PM +0300, Sergey Miroshnichenko wrote:
> Hello Bjorn,
> 
> On 9/28/19 1:02 AM, Bjorn Helgaas wrote:
> > On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
> > > When hot-adding a device, the bridge may have windows not big enough (or
> > > fragmented too much) for newly requested BARs to fit in. And expanding
> > > these bridge windows may be impossible because blocked by "neighboring"
> > > BARs and bridge windows.
> > > 
> > > Still, it may be possible to allocate a memory region for new BARs with the
> > > following procedure:
> > > 
> > > 1) notify all the drivers which support movable BARs to pause and release
> > >     the BARs; the rest of the drivers are guaranteed that their devices will
> > >     not get BARs moved;
> > > 
> > > 2) release all the bridge windows except of root bridges;
> > > 
> > > 3) try to recalculate new bridge windows that will fit all the BAR types:
> > >     - fixed;
> > >     - immovable;
> > >     - movable;
> > >     - newly requested by hot-added devices;
> > > 
> > > 4) if the previous step fails, disable BARs for one of the hot-added
> > >     devices and retry from step 3;
> > > 
> > > 5) notify the drivers, so they remap BARs and resume.
> > 
> > You don't do the actual recalculation in *this* patch, but since you
> > mention the procedure here, are we confident that we never make things
> > worse?
> > 
> > It's possible that a hot-add will trigger this attempt to move things
> > around, and it's possible that we won't find space for the new device
> > even if we move things around.  But are we certain that every device
> > that worked *before* the hot-add will still work *afterwards*?
> > 
> > Much of the assignment was probably done by the BIOS using different
> > algorithms than Linux has, so I think there's some chance that the
> > BIOS did a better job and if we lose that BIOS assignment, we might
> > not be able to recreate it.
> 
> If a hardware has some special constraints on BAR assignment that the
> kernel is not aware of yet, the movable BARs may break things after a
> hotplug event. So the feature must be disabled there (manually) until
> the kernel get support for that special needs.

I'm not talking about special constraints on BAR assignment.  (I'm not
sure what those constraints would be -- AFAIK the constraints for a
spec-compliant device are all discoverable via the BAR size and type
(or the Enhanced Allocation capability)).

What I'm concerned about is the case where we boot with a working
assignment, we hot-add a device, we move things around to try to
accommodate the new device, and not only do we fail to find resources
for the new device, we also fail to find a working assignment for the
devices that were present at boot.  We've moved things around from
what BIOS did, and since we use a different algorithm than the BIOS,
there's no guarantee that we'll be able to find the assignment BIOS
did.

> > I'm not sure why the PCI_CLASS_DISPLAY_VGA special case is there; can
> > you add a comment about why that's needed?  Obviously we can't move
> > the 0xa0000 legacy frame buffer because I think devices are allowed to
> > claim that region even if no BAR describes it.  But I would think
> > *other* BARs of VGA devices could be movable.
> 
> Sure, I'll add a comment to the code.
> 
> The issue that we are avoiding by that is the "nomodeset" command line
> argument, which prevents a video driver from being bound, so the BARs
> are seems to be used, but can't be moved, otherwise machines just hang
> after hotplug events. That was the only special ugly case we've
> spotted during testing. I'll check if it will be enough just to work
> around the 0xa0000.

"nomodeset" is not really documented and is a funny way to say "don't
bind video drivers that know about it", but OK.  Thanks for checking
on the other BARs.

> > > +bool pci_movable_bars_enabled(void);
> > 
> > I would really like it if this were simply
> > 
> >    extern bool pci_no_movable_bars;
> > 
> > in drivers/pci/pci.h.  It would default to false since it's
> > uninitialized, and "pci=no_movable_bars" would set it to true.
> 
> I have a premonition of platforms that will not support the feature.
> Wouldn't be better to put this variable-flag to include/linux/pci.h ,
> so code in arch/* can set it, so they could work by default, without
> the command line argument?

In general I don't see why a platform wouldn't support this since
there really isn't anything platform-specific here.  But if a platform
does need to disable it, having arch code set this flag sounds
reasonable.  We shouldn't make it globally visible until we actually
need that, though.

> > We have similar "=off" and "=force" parameters for ASPM and other
> > things, and it makes the code really hard to analyze.

The "=off" and "=force" things are the biggest things I'd like to
avoid.

Bjorn

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
  2019-10-15 22:14         ` Bjorn Helgaas
@ 2019-10-16 15:50           ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-10-16 15:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linuxppc-dev, linux, Sam Bobroff, Rajat Jain,
	Lukas Wunner, Oliver O'Halloran, David Laight

On 10/16/19 1:14 AM, Bjorn Helgaas wrote:
> On Mon, Sep 30, 2019 at 03:59:25PM +0300, Sergey Miroshnichenko wrote:
>> Hello Bjorn,
>>
>> On 9/28/19 1:02 AM, Bjorn Helgaas wrote:
>>> On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
>>>> When hot-adding a device, the bridge may have windows not big enough (or
>>>> fragmented too much) for newly requested BARs to fit in. And expanding
>>>> these bridge windows may be impossible because blocked by "neighboring"
>>>> BARs and bridge windows.
>>>>
>>>> Still, it may be possible to allocate a memory region for new BARs with the
>>>> following procedure:
>>>>
>>>> 1) notify all the drivers which support movable BARs to pause and release
>>>>      the BARs; the rest of the drivers are guaranteed that their devices will
>>>>      not get BARs moved;
>>>>
>>>> 2) release all the bridge windows except of root bridges;
>>>>
>>>> 3) try to recalculate new bridge windows that will fit all the BAR types:
>>>>      - fixed;
>>>>      - immovable;
>>>>      - movable;
>>>>      - newly requested by hot-added devices;
>>>>
>>>> 4) if the previous step fails, disable BARs for one of the hot-added
>>>>      devices and retry from step 3;
>>>>
>>>> 5) notify the drivers, so they remap BARs and resume.
>>>
>>> You don't do the actual recalculation in *this* patch, but since you
>>> mention the procedure here, are we confident that we never make things
>>> worse?
>>>
>>> It's possible that a hot-add will trigger this attempt to move things
>>> around, and it's possible that we won't find space for the new device
>>> even if we move things around.  But are we certain that every device
>>> that worked *before* the hot-add will still work *afterwards*?
>>>
>>> Much of the assignment was probably done by the BIOS using different
>>> algorithms than Linux has, so I think there's some chance that the
>>> BIOS did a better job and if we lose that BIOS assignment, we might
>>> not be able to recreate it.
>>
>> If a hardware has some special constraints on BAR assignment that the
>> kernel is not aware of yet, the movable BARs may break things after a
>> hotplug event. So the feature must be disabled there (manually) until
>> the kernel get support for that special needs.
> 
> I'm not talking about special constraints on BAR assignment.  (I'm not
> sure what those constraints would be -- AFAIK the constraints for a
> spec-compliant device are all discoverable via the BAR size and type
> (or the Enhanced Allocation capability)).
> 
> What I'm concerned about is the case where we boot with a working
> assignment, we hot-add a device, we move things around to try to
> accommodate the new device, and not only do we fail to find resources
> for the new device, we also fail to find a working assignment for the
> devices that were present at boot.  We've moved things around from
> what BIOS did, and since we use a different algorithm than the BIOS,
> there's no guarantee that we'll be able to find the assignment BIOS
> did.
> 

If BAR assignment fails with a hot-added device, these patches will
disable BARs for this device and retry, falling back to the situation
where number of BARs and their size are the same as they were before
the hotplug event.

If all the BARs are immovable - they will just remain on their
positions. Nothing to break here I guess.

If almost all the BARs are immovable and there is one movable BAR,
after releasing the bridge windows there will be a free gap - right
where this movable BAR was. These patches are keeping the size of
released BARs, not requesting the size from the devices again - so the
device can't ask for a larger BAR. The space reserving is disabled by
this patchset, so the kernel will request the same size for the bridge
window containing this movable BAR. So there always will be a gap for
this BAR - in the same location it was before.

Based on these considerations I assume that the kernel is always able
to arrange BARs from scratch if a BIOS was able to make it before.

But! There is an implicit speculation that there will be the same
amount of BARs after the fallback (which is equivalent to a PCI rescan
triggered on unchanged topology). And two week ago I've found that
this is not always true!

I was testing on a "new" x86_64 PC, where BIOS doesn't reserve a space
for SR-IOV BARs (of a network adapter). On the boot, the kernel wasn't
arranging BARs itself - it took values written by the BIOS. And the
bridge window was "jammed" between immovable BARs, so it can't expand.
BARs of this device are also immovable, so the bridge window can't be
moved away. During the PCI rescan, the kernel tried to allocate both
"regular" and SR-IOV BARs - and failed. Even without changes in the
PCI topology.

So in the next version of this series there will be one more patch,
that allows the kernel to ignore BIOS's setting for the "safe" (non-IO
and non-VGA) BARs, so these BARs will be arranged kernel-way - and
also those forgotten by the BIOS.

>>> I'm not sure why the PCI_CLASS_DISPLAY_VGA special case is there; can
>>> you add a comment about why that's needed?  Obviously we can't move
>>> the 0xa0000 legacy frame buffer because I think devices are allowed to
>>> claim that region even if no BAR describes it.  But I would think
>>> *other* BARs of VGA devices could be movable.
>>
>> Sure, I'll add a comment to the code.
>>
>> The issue that we are avoiding by that is the "nomodeset" command line
>> argument, which prevents a video driver from being bound, so the BARs
>> are seems to be used, but can't be moved, otherwise machines just hang
>> after hotplug events. That was the only special ugly case we've
>> spotted during testing. I'll check if it will be enough just to work
>> around the 0xa0000.
> 
> "nomodeset" is not really documented and is a funny way to say "don't
> bind video drivers that know about it", but OK.  Thanks for checking
> on the other BARs.
> 

After modifying the code as you advised, it became possible to mark
only some BARs of the device as immovable. So the code is less ugly
now, and it also works for drivers/video/fbdev/efifb.c , which uses
the BAR in a weird way (dev->driver is NULL, but not the res->child):

   static bool pci_dev_movable(struct pci_dev *dev,
                               bool res_has_children)
   {
     if (!pci_can_move_bars)
       return false;

     if (dev->driver && dev->driver->rescan_prepare)
       return true;

     if (!dev->driver && !res_has_children)
       return true;

     return false;
   }

   bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
   {
     if (res->flags & IORESOURCE_PCI_FIXED)
       return false;

     #ifdef CONFIG_X86
     /* Workaround for the legacy VGA memory 0xa0000-0xbffff */
     if (res->start == 0xa0000)
       return false;
     #endif

     return pci_dev_movable(dev, res->child);
   }

>>>> +bool pci_movable_bars_enabled(void);
>>>
>>> I would really like it if this were simply
>>>
>>>     extern bool pci_no_movable_bars;
>>>
>>> in drivers/pci/pci.h.  It would default to false since it's
>>> uninitialized, and "pci=no_movable_bars" would set it to true.
>>
>> I have a premonition of platforms that will not support the feature.
>> Wouldn't be better to put this variable-flag to include/linux/pci.h ,
>> so code in arch/* can set it, so they could work by default, without
>> the command line argument?
> 
> In general I don't see why a platform wouldn't support this since
> there really isn't anything platform-specific here.  But if a platform
> does need to disable it, having arch code set this flag sounds
> reasonable.  We shouldn't make it globally visible until we actually
> need that, though.
> 

On powerpc the Extended Error Handling hardware facility doesn't allow
to shuffle the BARs (without notifying the platform code), otherwise
it reports errors.

I'm working on adding support for powerpc/powernv, but powerpc/pseries
also has EEH, and I don't have a hardware to test there.

So the arch/powerpc/platforms/pseries/setup.c will be modified as
follows in the next version of this patchset:

@@ -920,6 +920,8 @@ static void __init pseries_init(void)
  {
  	pr_debug(" -> pseries_init()\n");

+	pci_can_move_bars = false;
+


Best regards,
Serge

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
@ 2019-10-16 15:50           ` Sergey Miroshnichenko
  0 siblings, 0 replies; 78+ messages in thread
From: Sergey Miroshnichenko @ 2019-10-16 15:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: David Laight, Sam Bobroff, linux-pci, linux, Lukas Wunner,
	Oliver O'Halloran, Rajat Jain, linuxppc-dev

On 10/16/19 1:14 AM, Bjorn Helgaas wrote:
> On Mon, Sep 30, 2019 at 03:59:25PM +0300, Sergey Miroshnichenko wrote:
>> Hello Bjorn,
>>
>> On 9/28/19 1:02 AM, Bjorn Helgaas wrote:
>>> On Fri, Aug 16, 2019 at 07:50:41PM +0300, Sergey Miroshnichenko wrote:
>>>> When hot-adding a device, the bridge may have windows not big enough (or
>>>> fragmented too much) for newly requested BARs to fit in. And expanding
>>>> these bridge windows may be impossible because blocked by "neighboring"
>>>> BARs and bridge windows.
>>>>
>>>> Still, it may be possible to allocate a memory region for new BARs with the
>>>> following procedure:
>>>>
>>>> 1) notify all the drivers which support movable BARs to pause and release
>>>>      the BARs; the rest of the drivers are guaranteed that their devices will
>>>>      not get BARs moved;
>>>>
>>>> 2) release all the bridge windows except of root bridges;
>>>>
>>>> 3) try to recalculate new bridge windows that will fit all the BAR types:
>>>>      - fixed;
>>>>      - immovable;
>>>>      - movable;
>>>>      - newly requested by hot-added devices;
>>>>
>>>> 4) if the previous step fails, disable BARs for one of the hot-added
>>>>      devices and retry from step 3;
>>>>
>>>> 5) notify the drivers, so they remap BARs and resume.
>>>
>>> You don't do the actual recalculation in *this* patch, but since you
>>> mention the procedure here, are we confident that we never make things
>>> worse?
>>>
>>> It's possible that a hot-add will trigger this attempt to move things
>>> around, and it's possible that we won't find space for the new device
>>> even if we move things around.  But are we certain that every device
>>> that worked *before* the hot-add will still work *afterwards*?
>>>
>>> Much of the assignment was probably done by the BIOS using different
>>> algorithms than Linux has, so I think there's some chance that the
>>> BIOS did a better job and if we lose that BIOS assignment, we might
>>> not be able to recreate it.
>>
>> If a hardware has some special constraints on BAR assignment that the
>> kernel is not aware of yet, the movable BARs may break things after a
>> hotplug event. So the feature must be disabled there (manually) until
>> the kernel get support for that special needs.
> 
> I'm not talking about special constraints on BAR assignment.  (I'm not
> sure what those constraints would be -- AFAIK the constraints for a
> spec-compliant device are all discoverable via the BAR size and type
> (or the Enhanced Allocation capability)).
> 
> What I'm concerned about is the case where we boot with a working
> assignment, we hot-add a device, we move things around to try to
> accommodate the new device, and not only do we fail to find resources
> for the new device, we also fail to find a working assignment for the
> devices that were present at boot.  We've moved things around from
> what BIOS did, and since we use a different algorithm than the BIOS,
> there's no guarantee that we'll be able to find the assignment BIOS
> did.
> 

If BAR assignment fails with a hot-added device, these patches will
disable BARs for this device and retry, falling back to the situation
where number of BARs and their size are the same as they were before
the hotplug event.

If all the BARs are immovable - they will just remain on their
positions. Nothing to break here I guess.

If almost all the BARs are immovable and there is one movable BAR,
after releasing the bridge windows there will be a free gap - right
where this movable BAR was. These patches are keeping the size of
released BARs, not requesting the size from the devices again - so the
device can't ask for a larger BAR. The space reserving is disabled by
this patchset, so the kernel will request the same size for the bridge
window containing this movable BAR. So there always will be a gap for
this BAR - in the same location it was before.

Based on these considerations I assume that the kernel is always able
to arrange BARs from scratch if a BIOS was able to make it before.

But! There is an implicit speculation that there will be the same
amount of BARs after the fallback (which is equivalent to a PCI rescan
triggered on unchanged topology). And two week ago I've found that
this is not always true!

I was testing on a "new" x86_64 PC, where BIOS doesn't reserve a space
for SR-IOV BARs (of a network adapter). On the boot, the kernel wasn't
arranging BARs itself - it took values written by the BIOS. And the
bridge window was "jammed" between immovable BARs, so it can't expand.
BARs of this device are also immovable, so the bridge window can't be
moved away. During the PCI rescan, the kernel tried to allocate both
"regular" and SR-IOV BARs - and failed. Even without changes in the
PCI topology.

So in the next version of this series there will be one more patch,
that allows the kernel to ignore BIOS's setting for the "safe" (non-IO
and non-VGA) BARs, so these BARs will be arranged kernel-way - and
also those forgotten by the BIOS.

>>> I'm not sure why the PCI_CLASS_DISPLAY_VGA special case is there; can
>>> you add a comment about why that's needed?  Obviously we can't move
>>> the 0xa0000 legacy frame buffer because I think devices are allowed to
>>> claim that region even if no BAR describes it.  But I would think
>>> *other* BARs of VGA devices could be movable.
>>
>> Sure, I'll add a comment to the code.
>>
>> The issue that we are avoiding by that is the "nomodeset" command line
>> argument, which prevents a video driver from being bound, so the BARs
>> are seems to be used, but can't be moved, otherwise machines just hang
>> after hotplug events. That was the only special ugly case we've
>> spotted during testing. I'll check if it will be enough just to work
>> around the 0xa0000.
> 
> "nomodeset" is not really documented and is a funny way to say "don't
> bind video drivers that know about it", but OK.  Thanks for checking
> on the other BARs.
> 

After modifying the code as you advised, it became possible to mark
only some BARs of the device as immovable. So the code is less ugly
now, and it also works for drivers/video/fbdev/efifb.c , which uses
the BAR in a weird way (dev->driver is NULL, but not the res->child):

   static bool pci_dev_movable(struct pci_dev *dev,
                               bool res_has_children)
   {
     if (!pci_can_move_bars)
       return false;

     if (dev->driver && dev->driver->rescan_prepare)
       return true;

     if (!dev->driver && !res_has_children)
       return true;

     return false;
   }

   bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
   {
     if (res->flags & IORESOURCE_PCI_FIXED)
       return false;

     #ifdef CONFIG_X86
     /* Workaround for the legacy VGA memory 0xa0000-0xbffff */
     if (res->start == 0xa0000)
       return false;
     #endif

     return pci_dev_movable(dev, res->child);
   }

>>>> +bool pci_movable_bars_enabled(void);
>>>
>>> I would really like it if this were simply
>>>
>>>     extern bool pci_no_movable_bars;
>>>
>>> in drivers/pci/pci.h.  It would default to false since it's
>>> uninitialized, and "pci=no_movable_bars" would set it to true.
>>
>> I have a premonition of platforms that will not support the feature.
>> Wouldn't be better to put this variable-flag to include/linux/pci.h ,
>> so code in arch/* can set it, so they could work by default, without
>> the command line argument?
> 
> In general I don't see why a platform wouldn't support this since
> there really isn't anything platform-specific here.  But if a platform
> does need to disable it, having arch code set this flag sounds
> reasonable.  We shouldn't make it globally visible until we actually
> need that, though.
> 

On powerpc the Extended Error Handling hardware facility doesn't allow
to shuffle the BARs (without notifying the platform code), otherwise
it reports errors.

I'm working on adding support for powerpc/powernv, but powerpc/pseries
also has EEH, and I don't have a hardware to test there.

So the arch/powerpc/platforms/pseries/setup.c will be modified as
follows in the next version of this patchset:

@@ -920,6 +920,8 @@ static void __init pseries_init(void)
  {
  	pr_debug(" -> pseries_init()\n");

+	pci_can_move_bars = false;
+


Best regards,
Serge

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
  2019-10-16 15:50           ` Sergey Miroshnichenko
@ 2019-10-16 17:29             ` Bjorn Helgaas
  -1 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2019-10-16 17:29 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: linux-pci, linuxppc-dev, linux, Sam Bobroff, Rajat Jain,
	Lukas Wunner, Oliver O'Halloran, David Laight

On Wed, Oct 16, 2019 at 06:50:30PM +0300, Sergey Miroshnichenko wrote:
> On 10/16/19 1:14 AM, Bjorn Helgaas wrote:
> > On Mon, Sep 30, 2019 at 03:59:25PM +0300, Sergey Miroshnichenko wrote:
> > > On 9/28/19 1:02 AM, Bjorn Helgaas wrote:

> > > > It's possible that a hot-add will trigger this attempt to move things
> > > > around, and it's possible that we won't find space for the new device
> > > > even if we move things around.  But are we certain that every device
> > > > that worked *before* the hot-add will still work *afterwards*?
> > > > 
> > > > Much of the assignment was probably done by the BIOS using different
> > > > algorithms than Linux has, so I think there's some chance that the
> > > > BIOS did a better job and if we lose that BIOS assignment, we might
> > > > not be able to recreate it.
> > > 
> > > If a hardware has some special constraints on BAR assignment that the
> > > kernel is not aware of yet, the movable BARs may break things after a
> > > hotplug event. So the feature must be disabled there (manually) until
> > > the kernel get support for that special needs.
> > 
> > I'm not talking about special constraints on BAR assignment.  (I'm not
> > sure what those constraints would be -- AFAIK the constraints for a
> > spec-compliant device are all discoverable via the BAR size and type
> > (or the Enhanced Allocation capability)).
> > 
> > What I'm concerned about is the case where we boot with a working
> > assignment, we hot-add a device, we move things around to try to
> > accommodate the new device, and not only do we fail to find resources
> > for the new device, we also fail to find a working assignment for the
> > devices that were present at boot.  We've moved things around from
> > what BIOS did, and since we use a different algorithm than the BIOS,
> > there's no guarantee that we'll be able to find the assignment BIOS
> > did.
> 
> If BAR assignment fails with a hot-added device, these patches will
> disable BARs for this device and retry, falling back to the situation
> where number of BARs and their size are the same as they were before
> the hotplug event.
> 
> If all the BARs are immovable - they will just remain on their
> positions. Nothing to break here I guess.
> 
> If almost all the BARs are immovable and there is one movable BAR,
> after releasing the bridge windows there will be a free gap - right
> where this movable BAR was. These patches are keeping the size of
> released BARs, not requesting the size from the devices again - so the
> device can't ask for a larger BAR. The space reserving is disabled by
> this patchset, so the kernel will request the same size for the bridge
> window containing this movable BAR. So there always will be a gap for
> this BAR - in the same location it was before.
> 
> Based on these considerations I assume that the kernel is always able
> to arrange BARs from scratch if a BIOS was able to make it before.
> 
> But! There is an implicit speculation that there will be the same
> amount of BARs after the fallback (which is equivalent to a PCI rescan
> triggered on unchanged topology). And two week ago I've found that
> this is not always true!
> 
> I was testing on a "new" x86_64 PC, where BIOS doesn't reserve a space
> for SR-IOV BARs (of a network adapter). On the boot, the kernel wasn't
> arranging BARs itself - it took values written by the BIOS. And the
> bridge window was "jammed" between immovable BARs, so it can't expand.
> BARs of this device are also immovable, so the bridge window can't be
> moved away. During the PCI rescan, the kernel tried to allocate both
> "regular" and SR-IOV BARs - and failed. Even without changes in the
> PCI topology.
> 
> So in the next version of this series there will be one more patch,
> that allows the kernel to ignore BIOS's setting for the "safe" (non-IO
> and non-VGA) BARs, so these BARs will be arranged kernel-way - and
> also those forgotten by the BIOS.

This still seems a little scary, so I'll probably ask about it again :)

> After modifying the code as you advised, it became possible to mark
> only some BARs of the device as immovable. So the code is less ugly
> now, and it also works for drivers/video/fbdev/efifb.c , which uses
> the BAR in a weird way (dev->driver is NULL, but not the res->child):
> 
>   static bool pci_dev_movable(struct pci_dev *dev,
>                               bool res_has_children)
>   {
>     if (!pci_can_move_bars)
>       return false;
> 
>     if (dev->driver && dev->driver->rescan_prepare)
>       return true;
> 
>     if (!dev->driver && !res_has_children)
>       return true;
> 
>     return false;
>   }
> 
>   bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
>   {
>     if (res->flags & IORESOURCE_PCI_FIXED)
>       return false;
> 
>     #ifdef CONFIG_X86
>     /* Workaround for the legacy VGA memory 0xa0000-0xbffff */
>     if (res->start == 0xa0000)

Nit here; "res->start" is the CPU address, but what you need to check
is the PCI bus address, e.g., something from pcibios_resource_to_bus().
And this is not x86-specific.  0xa0000 is magic on PCI no matter what
the processor architecture.

>       return false;
>     #endif
> 
>     return pci_dev_movable(dev, res->child);
>   }

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature
@ 2019-10-16 17:29             ` Bjorn Helgaas
  0 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2019-10-16 17:29 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: David Laight, Sam Bobroff, linux-pci, linux, Lukas Wunner,
	Oliver O'Halloran, Rajat Jain, linuxppc-dev

On Wed, Oct 16, 2019 at 06:50:30PM +0300, Sergey Miroshnichenko wrote:
> On 10/16/19 1:14 AM, Bjorn Helgaas wrote:
> > On Mon, Sep 30, 2019 at 03:59:25PM +0300, Sergey Miroshnichenko wrote:
> > > On 9/28/19 1:02 AM, Bjorn Helgaas wrote:

> > > > It's possible that a hot-add will trigger this attempt to move things
> > > > around, and it's possible that we won't find space for the new device
> > > > even if we move things around.  But are we certain that every device
> > > > that worked *before* the hot-add will still work *afterwards*?
> > > > 
> > > > Much of the assignment was probably done by the BIOS using different
> > > > algorithms than Linux has, so I think there's some chance that the
> > > > BIOS did a better job and if we lose that BIOS assignment, we might
> > > > not be able to recreate it.
> > > 
> > > If a hardware has some special constraints on BAR assignment that the
> > > kernel is not aware of yet, the movable BARs may break things after a
> > > hotplug event. So the feature must be disabled there (manually) until
> > > the kernel get support for that special needs.
> > 
> > I'm not talking about special constraints on BAR assignment.  (I'm not
> > sure what those constraints would be -- AFAIK the constraints for a
> > spec-compliant device are all discoverable via the BAR size and type
> > (or the Enhanced Allocation capability)).
> > 
> > What I'm concerned about is the case where we boot with a working
> > assignment, we hot-add a device, we move things around to try to
> > accommodate the new device, and not only do we fail to find resources
> > for the new device, we also fail to find a working assignment for the
> > devices that were present at boot.  We've moved things around from
> > what BIOS did, and since we use a different algorithm than the BIOS,
> > there's no guarantee that we'll be able to find the assignment BIOS
> > did.
> 
> If BAR assignment fails with a hot-added device, these patches will
> disable BARs for this device and retry, falling back to the situation
> where number of BARs and their size are the same as they were before
> the hotplug event.
> 
> If all the BARs are immovable - they will just remain on their
> positions. Nothing to break here I guess.
> 
> If almost all the BARs are immovable and there is one movable BAR,
> after releasing the bridge windows there will be a free gap - right
> where this movable BAR was. These patches are keeping the size of
> released BARs, not requesting the size from the devices again - so the
> device can't ask for a larger BAR. The space reserving is disabled by
> this patchset, so the kernel will request the same size for the bridge
> window containing this movable BAR. So there always will be a gap for
> this BAR - in the same location it was before.
> 
> Based on these considerations I assume that the kernel is always able
> to arrange BARs from scratch if a BIOS was able to make it before.
> 
> But! There is an implicit speculation that there will be the same
> amount of BARs after the fallback (which is equivalent to a PCI rescan
> triggered on unchanged topology). And two week ago I've found that
> this is not always true!
> 
> I was testing on a "new" x86_64 PC, where BIOS doesn't reserve a space
> for SR-IOV BARs (of a network adapter). On the boot, the kernel wasn't
> arranging BARs itself - it took values written by the BIOS. And the
> bridge window was "jammed" between immovable BARs, so it can't expand.
> BARs of this device are also immovable, so the bridge window can't be
> moved away. During the PCI rescan, the kernel tried to allocate both
> "regular" and SR-IOV BARs - and failed. Even without changes in the
> PCI topology.
> 
> So in the next version of this series there will be one more patch,
> that allows the kernel to ignore BIOS's setting for the "safe" (non-IO
> and non-VGA) BARs, so these BARs will be arranged kernel-way - and
> also those forgotten by the BIOS.

This still seems a little scary, so I'll probably ask about it again :)

> After modifying the code as you advised, it became possible to mark
> only some BARs of the device as immovable. So the code is less ugly
> now, and it also works for drivers/video/fbdev/efifb.c , which uses
> the BAR in a weird way (dev->driver is NULL, but not the res->child):
> 
>   static bool pci_dev_movable(struct pci_dev *dev,
>                               bool res_has_children)
>   {
>     if (!pci_can_move_bars)
>       return false;
> 
>     if (dev->driver && dev->driver->rescan_prepare)
>       return true;
> 
>     if (!dev->driver && !res_has_children)
>       return true;
> 
>     return false;
>   }
> 
>   bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
>   {
>     if (res->flags & IORESOURCE_PCI_FIXED)
>       return false;
> 
>     #ifdef CONFIG_X86
>     /* Workaround for the legacy VGA memory 0xa0000-0xbffff */
>     if (res->start == 0xa0000)

Nit here; "res->start" is the CPU address, but what you need to check
is the PCI bus address, e.g., something from pcibios_resource_to_bus().
And this is not x86-specific.  0xa0000 is magic on PCI no matter what
the processor architecture.

>       return false;
>     #endif
> 
>     return pci_dev_movable(dev, res->child);
>   }

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2019-10-16 17:31 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16 16:50 [PATCH v5 00/23] PCI: Allow BAR movement during hotplug Sergey Miroshnichenko
2019-08-16 16:50 ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device() Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-22 12:37   ` Marta Rybczynska
2019-08-22 12:37     ` Marta Rybczynska
2019-09-27 21:59   ` Bjorn Helgaas
2019-09-27 21:59     ` Bjorn Helgaas
2019-09-30  8:53     ` Sergey Miroshnichenko
2019-09-30  8:53       ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 02/23] PCI: Enable bridge's I/O and MEM access for hotplugged devices Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-09-27 22:01   ` Bjorn Helgaas
2019-08-16 16:50 ` [PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-09-27 22:02   ` Bjorn Helgaas
2019-09-27 22:02     ` Bjorn Helgaas
2019-09-30  8:44     ` David Laight
2019-09-30 16:17       ` Sergey Miroshnichenko
2019-09-30 12:59     ` Sergey Miroshnichenko
2019-09-30 12:59       ` Sergey Miroshnichenko
2019-10-15 22:14       ` Bjorn Helgaas
2019-10-15 22:14         ` Bjorn Helgaas
2019-10-16 15:50         ` Sergey Miroshnichenko
2019-10-16 15:50           ` Sergey Miroshnichenko
2019-10-16 17:29           ` Bjorn Helgaas
2019-10-16 17:29             ` Bjorn Helgaas
2019-08-16 16:50 ` [PATCH v5 04/23] PCI: Define PCI-specific version of the release_child_resources() Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 05/23] PCI: hotplug: movable BARs: Fix reassigning the released bridge windows Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 06/23] PCI: hotplug: movable BARs: Recalculate all bridge windows during rescan Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 07/23] PCI: hotplug: movable BARs: Don't allow added devices to steal resources Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 08/23] PCI: Include fixed and immovable BARs into the bus size calculating Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 09/23] PCI: Prohibit assigning BARs and bridge windows to non-direct parents Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 10/23] PCI: hotplug: movable BARs: Try to assign unassigned resources only once Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 11/23] PCI: hotplug: movable BARs: Calculate immovable parts of bridge windows Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 12/23] PCI: hotplug: movable BARs: Compute limits for relocated " Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 13/23] PCI: Make sure bridge windows include their fixed BARs Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 14/23] PCI: Fix assigning the fixed prefetchable resources Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 15/23] PCI: hotplug: movable BARs: Assign fixed and immovable BARs before others Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 16/23] PCI: hotplug: movable BARs: Don't reserve IO/mem bus space Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-09-04  5:42   ` Oliver O'Halloran
2019-09-04  5:42     ` Oliver O'Halloran
2019-09-04 11:22     ` Sergey Miroshnichenko
2019-09-04 11:22       ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 17/23] powerpc/pci: Fix crash with enabled movable BARs Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 18/23] powerpc/pci: Handle BAR movement Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-09-04  5:37   ` Oliver O'Halloran
2019-09-04  5:37     ` Oliver O'Halloran
2019-09-06 16:24     ` Sergey Miroshnichenko
2019-09-06 16:24       ` Sergey Miroshnichenko
2019-09-09 14:02       ` Oliver O'Halloran
2019-09-09 14:02         ` Oliver O'Halloran
2019-08-16 16:50 ` [PATCH v5 19/23] PCI: hotplug: Configure MPS for hot-added bridges during bus rescan Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 20/23] PCI: hotplug: movable BARs: Enable the feature by default Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50 ` [PATCH v5 21/23] nvme-pci: Handle movable BARs Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:50   ` Sergey Miroshnichenko
2019-08-16 16:51 ` [PATCH v5 22/23] PCI/portdrv: Declare support of " Sergey Miroshnichenko
2019-08-16 16:51   ` Sergey Miroshnichenko
2019-08-16 16:51 ` [PATCH v5 23/23] PCI: pciehp: movable BARs: Trigger a domain rescan on hp events Sergey Miroshnichenko
2019-08-16 16:51   ` Sergey Miroshnichenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.