All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v4 00/21] PCI: Allow BAR movement during hotplug
@ 2019-03-11 13:31 ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko,
	Oliver O'Halloran, Benjamin Herrenschmidt, Sam Bobroff,
	Lukas Wunner, Stewart Smith, Alexey Kardashevskiy, Rajat Jain

If the firmware or kernel has arranged memory for PCIe devices in a way
that doesn't provide enough space for BARs of a new hotplugged device, the
kernel can pause the drivers of the "obstructing" devices and move their
BARs, so new BARs can fit into the freed spaces.

When a driver is un-paused by the kernel after the PCIe rescan, it should
check if its BARs had moved, and ioremap() them if needed.

Drivers indicate their support of the feature by implementing the new
rescan_prepare() and rescan_done() hooks in the struct pci_driver. If a
driver doesn't yet support the feature, BARs of its devices will be marked
as immovable by the IORESOURCE_PCI_FIXED flag.

To re-arrange the BARs and bridge windows this patch releases all of them
after a rescan and re-assigns in the same way as during the initial PCIe
topology scan at system boot.

Tested on:
 - x86_64 with "pci=realloc,assign-busses,use_crs pcie_movable_bars=force"
 - POWER8 PowerNV+PHB3 ppc64le with [1] and [2] applied and the following:
   "pci=realloc pcie_movable_bars=force"

Not so many platforms and test cases were covered, so all who are
interested are highly welcome to test on your setups - the more exotic the
better!

This patchset is a part of our work on adding support for hotplugging
bridges full of NVME and GPU devices without special requirements such as
Hot-Plug Controller, reservation of bus numbers or memory regions by
firmware, etc. Future work will be devoted to implementing the movable bus
numbers.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-March/186618.html
[2] https://lists.ozlabs.org/pipermail/skiboot/2019-March/013571.html

Changes since v3:
 - Rebased to the upstream, so the patches apply cleanly again.

Changes since v2:
 - Fixed double-assignment of bridge windows;
 - Fixed assignment of fixed prefetched resources;
 - Fixed releasing of fixed resources;
 - Fixed a debug message;
 - Removed auto-enabling the movable BARs for x86 - let's rely on the
   "pcie_movable_bars=force" option for now;
 - Reordered the patches - bugfixes first.

Changes since v1:
 - Add a "pcie_movable_bars={ off | force }" command line argument;
 - Handle the IORESOURCE_PCI_FIXED flag properly;
 - Don't move BARs of devices which don't support the feature;
 - Guarantee that new hotplugged devices will not steal memory from working
   devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
   flag;
 - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
   using the reset_prepare()+reset_done() from struct pci_error_handlers;
 - Add a bugfix of a race condition;
 - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
 - Fix the compatibility of the feature with pm_runtime and D3-state;
 - Hotplug events from pciehp also can move BARs;
 - Add support of the feature to the NVME driver.

Sergey Miroshnichenko (21):
  PCI: Fix writing invalid BARs during pci_restore_state()
  PCI: Fix race condition in pci_enable/disable_device()
  PCI: Enable bridge's I/O and MEM access for hotplugged devices
  PCI: Define PCI-specific version of the release_child_resources()
  PCI: hotplug: Add a flag for the movable BARs feature
  PCI: Pause the devices with movable BARs during rescan
  PCI: Wake up bridges during rescan when movable BARs enabled
  nvme-pci: Handle movable BARs
  PCI: Mark immovable BARs with PCI_FIXED
  PCI: Fix assigning of fixed prefetchable resources
  PCI: Release and reassign the root bridge resources during rescan
  PCI: Don't allow hotplugged devices to steal resources
  PCI: Include fixed BARs into the bus size calculating
  PCI: Don't reserve memory for hotplug when enabled movable BARs
  PCI: Allow the failed resources to be reassigned later
  PCI: Calculate fixed areas of bridge windows based on fixed BARs
  PCI: Calculate boundaries for bridge windows
  PCI: Make sure bridge windows include their fixed BARs
  PCI: Prioritize fixed BAR assigning over the movable ones
  PCI: pciehp: Add support for the movable BARs feature
  powerpc/pci: Fix crash with enabled movable BARs

 .../admin-guide/kernel-parameters.txt         |   7 +
 arch/powerpc/platforms/powernv/pci-ioda.c     |   3 +-
 drivers/nvme/host/pci.c                       |  29 +-
 drivers/pci/bus.c                             |   7 +-
 drivers/pci/hotplug/pciehp_pci.c              |  14 +-
 drivers/pci/pci.c                             |  60 +++-
 drivers/pci/pci.h                             |  26 ++
 drivers/pci/probe.c                           | 271 +++++++++++++++++-
 drivers/pci/setup-bus.c                       | 245 ++++++++++++++--
 drivers/pci/setup-res.c                       |  43 ++-
 include/linux/pci.h                           |  14 +
 11 files changed, 678 insertions(+), 41 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 00/21] PCI: Allow BAR movement during hotplug
@ 2019-03-11 13:31 ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev
  Cc: Stewart Smith, Sam Bobroff, Sergey Miroshnichenko, linux,
	Alexey Kardashevskiy, Lukas Wunner, Oliver O'Halloran,
	Bjorn Helgaas, Rajat Jain

If the firmware or kernel has arranged memory for PCIe devices in a way
that doesn't provide enough space for BARs of a new hotplugged device, the
kernel can pause the drivers of the "obstructing" devices and move their
BARs, so new BARs can fit into the freed spaces.

When a driver is un-paused by the kernel after the PCIe rescan, it should
check if its BARs had moved, and ioremap() them if needed.

Drivers indicate their support of the feature by implementing the new
rescan_prepare() and rescan_done() hooks in the struct pci_driver. If a
driver doesn't yet support the feature, BARs of its devices will be marked
as immovable by the IORESOURCE_PCI_FIXED flag.

To re-arrange the BARs and bridge windows this patch releases all of them
after a rescan and re-assigns in the same way as during the initial PCIe
topology scan at system boot.

Tested on:
 - x86_64 with "pci=realloc,assign-busses,use_crs pcie_movable_bars=force"
 - POWER8 PowerNV+PHB3 ppc64le with [1] and [2] applied and the following:
   "pci=realloc pcie_movable_bars=force"

Not so many platforms and test cases were covered, so all who are
interested are highly welcome to test on your setups - the more exotic the
better!

This patchset is a part of our work on adding support for hotplugging
bridges full of NVME and GPU devices without special requirements such as
Hot-Plug Controller, reservation of bus numbers or memory regions by
firmware, etc. Future work will be devoted to implementing the movable bus
numbers.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-March/186618.html
[2] https://lists.ozlabs.org/pipermail/skiboot/2019-March/013571.html

Changes since v3:
 - Rebased to the upstream, so the patches apply cleanly again.

Changes since v2:
 - Fixed double-assignment of bridge windows;
 - Fixed assignment of fixed prefetched resources;
 - Fixed releasing of fixed resources;
 - Fixed a debug message;
 - Removed auto-enabling the movable BARs for x86 - let's rely on the
   "pcie_movable_bars=force" option for now;
 - Reordered the patches - bugfixes first.

Changes since v1:
 - Add a "pcie_movable_bars={ off | force }" command line argument;
 - Handle the IORESOURCE_PCI_FIXED flag properly;
 - Don't move BARs of devices which don't support the feature;
 - Guarantee that new hotplugged devices will not steal memory from working
   devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
   flag;
 - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
   using the reset_prepare()+reset_done() from struct pci_error_handlers;
 - Add a bugfix of a race condition;
 - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
 - Fix the compatibility of the feature with pm_runtime and D3-state;
 - Hotplug events from pciehp also can move BARs;
 - Add support of the feature to the NVME driver.

Sergey Miroshnichenko (21):
  PCI: Fix writing invalid BARs during pci_restore_state()
  PCI: Fix race condition in pci_enable/disable_device()
  PCI: Enable bridge's I/O and MEM access for hotplugged devices
  PCI: Define PCI-specific version of the release_child_resources()
  PCI: hotplug: Add a flag for the movable BARs feature
  PCI: Pause the devices with movable BARs during rescan
  PCI: Wake up bridges during rescan when movable BARs enabled
  nvme-pci: Handle movable BARs
  PCI: Mark immovable BARs with PCI_FIXED
  PCI: Fix assigning of fixed prefetchable resources
  PCI: Release and reassign the root bridge resources during rescan
  PCI: Don't allow hotplugged devices to steal resources
  PCI: Include fixed BARs into the bus size calculating
  PCI: Don't reserve memory for hotplug when enabled movable BARs
  PCI: Allow the failed resources to be reassigned later
  PCI: Calculate fixed areas of bridge windows based on fixed BARs
  PCI: Calculate boundaries for bridge windows
  PCI: Make sure bridge windows include their fixed BARs
  PCI: Prioritize fixed BAR assigning over the movable ones
  PCI: pciehp: Add support for the movable BARs feature
  powerpc/pci: Fix crash with enabled movable BARs

 .../admin-guide/kernel-parameters.txt         |   7 +
 arch/powerpc/platforms/powernv/pci-ioda.c     |   3 +-
 drivers/nvme/host/pci.c                       |  29 +-
 drivers/pci/bus.c                             |   7 +-
 drivers/pci/hotplug/pciehp_pci.c              |  14 +-
 drivers/pci/pci.c                             |  60 +++-
 drivers/pci/pci.h                             |  26 ++
 drivers/pci/probe.c                           | 271 +++++++++++++++++-
 drivers/pci/setup-bus.c                       | 245 ++++++++++++++--
 drivers/pci/setup-res.c                       |  43 ++-
 include/linux/pci.h                           |  14 +
 11 files changed, 678 insertions(+), 41 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 01/21] PCI: Fix writing invalid BARs during pci_restore_state()
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

If BAR movement has happened (due to PCIe hotplug) after pci_save_state(),
the saved addresses will become outdated. Restore them the most recently
calculated values, not the ones stored in an arbitrary moment.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7c1b362f599a..f006068be209 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1376,7 +1376,7 @@ static void pci_restore_config_space(struct pci_dev *pdev)
 	if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
 		pci_restore_config_space_range(pdev, 10, 15, 0, false);
 		/* Restore BARs before the command register. */
-		pci_restore_config_space_range(pdev, 4, 9, 10, false);
+		pci_restore_bars(pdev);
 		pci_restore_config_space_range(pdev, 0, 3, 0, false);
 	} else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
 		pci_restore_config_space_range(pdev, 12, 15, 0, false);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 01/21] PCI: Fix writing invalid BARs during pci_restore_state()
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

If BAR movement has happened (due to PCIe hotplug) after pci_save_state(),
the saved addresses will become outdated. Restore them the most recently
calculated values, not the ones stored in an arbitrary moment.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7c1b362f599a..f006068be209 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1376,7 +1376,7 @@ static void pci_restore_config_space(struct pci_dev *pdev)
 	if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
 		pci_restore_config_space_range(pdev, 10, 15, 0, false);
 		/* Restore BARs before the command register. */
-		pci_restore_config_space_range(pdev, 4, 9, 10, false);
+		pci_restore_bars(pdev);
 		pci_restore_config_space_range(pdev, 0, 3, 0, false);
 	} else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
 		pci_restore_config_space_range(pdev, 12, 15, 0, false);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device()
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

 CPU0                                      CPU1

 pci_enable_device_mem()                   pci_enable_device_mem()
   pci_enable_bridge()                       pci_enable_bridge()
     pci_is_enabled()
       return false;
     atomic_inc_return(enable_cnt)
     Start actual enabling the bridge
     ...                                       pci_is_enabled()
     ...                                         return true;
     ...                                   Start memory requests <-- FAIL
     ...
     Set the PCI_COMMAND_MEMORY bit <-- Must wait for this

This patch protects the pci_enable/disable_device() and pci_enable_bridge()
with mutexes.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  1 +
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f006068be209..895201d4c9e6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	struct pci_dev *bridge;
 	int retval;
 
+	mutex_lock(&dev->enable_mutex);
+
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
 		pci_enable_bridge(bridge);
@@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	if (pci_is_enabled(dev)) {
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
+		mutex_unlock(&dev->enable_mutex);
 		return;
 	}
 
@@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_err(dev, "Error enabling bridge (%d), continuing\n",
 			retval);
 	pci_set_master(dev);
+	mutex_unlock(&dev->enable_mutex);
 }
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
+	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
+	bool need_lock = !dev->subordinate;
 	int err;
 	int i, bars = 0;
 
@@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
 	}
 
-	if (atomic_inc_return(&dev->enable_cnt) > 1)
+	if (need_lock)
+		mutex_lock(&dev->enable_mutex);
+	if (pci_is_enabled(dev)) {
+		if (need_lock)
+			mutex_unlock(&dev->enable_mutex);
 		return 0;		/* already enabled */
+	}
 
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
@@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 			bars |= (1 << i);
 
 	err = do_pci_enable_device(dev, bars);
-	if (err < 0)
-		atomic_dec(&dev->enable_cnt);
+	if (err >= 0)
+		atomic_inc(&dev->enable_cnt);
+	if (need_lock)
+		mutex_unlock(&dev->enable_mutex);
 	return err;
 }
 
@@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev)
 	if (dr)
 		dr->enabled = 0;
 
+	mutex_lock(&dev->enable_mutex);
 	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
 		      "disabling already-disabled device");
 
-	if (atomic_dec_return(&dev->enable_cnt) != 0)
+	if (atomic_dec_return(&dev->enable_cnt) != 0) {
+		mutex_unlock(&dev->enable_mutex);
 		return;
+	}
 
 	do_pci_disable_device(dev);
 
 	dev->is_busmaster = 0;
+
+	mutex_unlock(&dev->enable_mutex);
 }
 EXPORT_SYMBOL(pci_disable_device);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2ec0df04e0dc..977a127ce791 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
 	INIT_LIST_HEAD(&dev->bus_list);
 	dev->dev.type = &pci_dev_type;
 	dev->bus = pci_bus_get(bus);
+	mutex_init(&dev->enable_mutex);
 
 	return dev;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 77448215ef5b..cb2760a31fe2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -419,6 +419,7 @@ struct pci_dev {
 	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
+	struct mutex	enable_mutex;
 
 	u32		saved_config_space[16]; /* Config space saved at suspend time */
 	struct hlist_head saved_cap_space;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device()
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

 CPU0                                      CPU1

 pci_enable_device_mem()                   pci_enable_device_mem()
   pci_enable_bridge()                       pci_enable_bridge()
     pci_is_enabled()
       return false;
     atomic_inc_return(enable_cnt)
     Start actual enabling the bridge
     ...                                       pci_is_enabled()
     ...                                         return true;
     ...                                   Start memory requests <-- FAIL
     ...
     Set the PCI_COMMAND_MEMORY bit <-- Must wait for this

This patch protects the pci_enable/disable_device() and pci_enable_bridge()
with mutexes.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  1 +
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f006068be209..895201d4c9e6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	struct pci_dev *bridge;
 	int retval;
 
+	mutex_lock(&dev->enable_mutex);
+
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
 		pci_enable_bridge(bridge);
@@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	if (pci_is_enabled(dev)) {
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
+		mutex_unlock(&dev->enable_mutex);
 		return;
 	}
 
@@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_err(dev, "Error enabling bridge (%d), continuing\n",
 			retval);
 	pci_set_master(dev);
+	mutex_unlock(&dev->enable_mutex);
 }
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
+	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
+	bool need_lock = !dev->subordinate;
 	int err;
 	int i, bars = 0;
 
@@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
 	}
 
-	if (atomic_inc_return(&dev->enable_cnt) > 1)
+	if (need_lock)
+		mutex_lock(&dev->enable_mutex);
+	if (pci_is_enabled(dev)) {
+		if (need_lock)
+			mutex_unlock(&dev->enable_mutex);
 		return 0;		/* already enabled */
+	}
 
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
@@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 			bars |= (1 << i);
 
 	err = do_pci_enable_device(dev, bars);
-	if (err < 0)
-		atomic_dec(&dev->enable_cnt);
+	if (err >= 0)
+		atomic_inc(&dev->enable_cnt);
+	if (need_lock)
+		mutex_unlock(&dev->enable_mutex);
 	return err;
 }
 
@@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev)
 	if (dr)
 		dr->enabled = 0;
 
+	mutex_lock(&dev->enable_mutex);
 	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
 		      "disabling already-disabled device");
 
-	if (atomic_dec_return(&dev->enable_cnt) != 0)
+	if (atomic_dec_return(&dev->enable_cnt) != 0) {
+		mutex_unlock(&dev->enable_mutex);
 		return;
+	}
 
 	do_pci_disable_device(dev);
 
 	dev->is_busmaster = 0;
+
+	mutex_unlock(&dev->enable_mutex);
 }
 EXPORT_SYMBOL(pci_disable_device);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2ec0df04e0dc..977a127ce791 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
 	INIT_LIST_HEAD(&dev->bus_list);
 	dev->dev.type = &pci_dev_type;
 	dev->bus = pci_bus_get(bus);
+	mutex_init(&dev->enable_mutex);
 
 	return dev;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 77448215ef5b..cb2760a31fe2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -419,6 +419,7 @@ struct pci_dev {
 	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
+	struct mutex	enable_mutex;
 
 	u32		saved_config_space[16]; /* Config space saved at suspend time */
 	struct hlist_head saved_cap_space;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 03/21] PCI: Enable bridge's I/O and MEM access for hotplugged devices
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

After updating the bridge window resources, the PCI_COMMAND_IO and
PCI_COMMAND_MEMORY bits of the bridge must be addressed as well.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 895201d4c9e6..69898fe5255e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1622,6 +1622,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_enable_bridge(bridge);
 
 	if (pci_is_enabled(dev)) {
+		int i, bars = 0;
+
+		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
+			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
+				bars |= (1 << i);
+		}
+		do_pci_enable_device(dev, bars);
+
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
 		mutex_unlock(&dev->enable_mutex);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 03/21] PCI: Enable bridge's I/O and MEM access for hotplugged devices
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

After updating the bridge window resources, the PCI_COMMAND_IO and
PCI_COMMAND_MEMORY bits of the bridge must be addressed as well.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 895201d4c9e6..69898fe5255e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1622,6 +1622,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_enable_bridge(bridge);
 
 	if (pci_is_enabled(dev)) {
+		int i, bars = 0;
+
+		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
+			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
+				bars |= (1 << i);
+		}
+		do_pci_enable_device(dev, bars);
+
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
 		mutex_unlock(&dev->enable_mutex);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 04/21] PCI: Define PCI-specific version of the release_child_resources()
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Make the released resources of a bridge valid for later re-assignment:
clear the STARTALIGN flag.

Resources marked with PCI_FIXED must preserve their offset and size.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 47 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ec44a0f3a7ac..3644feb13179 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1483,6 +1483,51 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
 	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
 	 IORESOURCE_MEM_64)
 
+/*
+ * Similar to release_child_resources(), but aware of PCI_FIXED and STARTALIGN flags
+ */
+static void pci_release_child_resources(struct resource *r)
+{
+	struct resource *tmp, *p;
+
+	if (!r)
+		return;
+
+	if (r->flags & IORESOURCE_PCI_FIXED)
+		return;
+
+	p = r->child;
+	r->child = NULL;
+	while (p) {
+		resource_size_t size = resource_size(p);
+
+		tmp = p;
+		p = p->sibling;
+
+		tmp->parent = NULL;
+		tmp->sibling = NULL;
+		pci_release_child_resources(tmp);
+
+		if (!tmp->flags)
+			continue;
+
+		if (tmp->flags & IORESOURCE_PCI_FIXED) {
+			pr_debug("PCI: release fixed %pR (%s), keep its flags, base and size\n",
+				 tmp, tmp->name);
+			continue;
+		}
+
+		pr_debug("PCI: release %pR (%s)\n", tmp, tmp->name);
+
+		/* need to restore size, and keep all the flags but STARTALIGN */
+		tmp->start = 0;
+		tmp->end = size - 1;
+
+		tmp->flags &= ~IORESOURCE_STARTALIGN;
+		tmp->flags |= IORESOURCE_SIZEALIGN;
+	}
+}
+
 static void pci_bridge_release_resources(struct pci_bus *bus,
 					  unsigned long type)
 {
@@ -1528,7 +1573,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 	 * if there are children under that, we should release them
 	 *  all
 	 */
-	release_child_resources(r);
+	pci_release_child_resources(r);
 	if (!release_resource(r)) {
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_printk(KERN_DEBUG, dev, "resource %d %pR released\n",
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 04/21] PCI: Define PCI-specific version of the release_child_resources()
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Make the released resources of a bridge valid for later re-assignment:
clear the STARTALIGN flag.

Resources marked with PCI_FIXED must preserve their offset and size.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 47 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ec44a0f3a7ac..3644feb13179 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1483,6 +1483,51 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
 	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
 	 IORESOURCE_MEM_64)
 
+/*
+ * Similar to release_child_resources(), but aware of PCI_FIXED and STARTALIGN flags
+ */
+static void pci_release_child_resources(struct resource *r)
+{
+	struct resource *tmp, *p;
+
+	if (!r)
+		return;
+
+	if (r->flags & IORESOURCE_PCI_FIXED)
+		return;
+
+	p = r->child;
+	r->child = NULL;
+	while (p) {
+		resource_size_t size = resource_size(p);
+
+		tmp = p;
+		p = p->sibling;
+
+		tmp->parent = NULL;
+		tmp->sibling = NULL;
+		pci_release_child_resources(tmp);
+
+		if (!tmp->flags)
+			continue;
+
+		if (tmp->flags & IORESOURCE_PCI_FIXED) {
+			pr_debug("PCI: release fixed %pR (%s), keep its flags, base and size\n",
+				 tmp, tmp->name);
+			continue;
+		}
+
+		pr_debug("PCI: release %pR (%s)\n", tmp, tmp->name);
+
+		/* need to restore size, and keep all the flags but STARTALIGN */
+		tmp->start = 0;
+		tmp->end = size - 1;
+
+		tmp->flags &= ~IORESOURCE_STARTALIGN;
+		tmp->flags |= IORESOURCE_SIZEALIGN;
+	}
+}
+
 static void pci_bridge_release_resources(struct pci_bus *bus,
 					  unsigned long type)
 {
@@ -1528,7 +1573,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 	 * if there are children under that, we should release them
 	 *  all
 	 */
-	release_child_resources(r);
+	pci_release_child_resources(r);
 	if (!release_resource(r)) {
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_printk(KERN_DEBUG, dev, "resource %d %pR released\n",
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 05/21] PCI: hotplug: Add a flag for the movable BARs feature
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

If a new PCIe device has been hot-plugged between the two active ones
without big enough gap between their BARs, these BARs should be moved
if their drivers support this feature. The drivers should be notified
and paused during the procedure:

1)                 dev 8 (new)
                       |
                       v
.. |  dev 3  |  dev 3  |  dev 5  |  dev 7  |
.. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |

2)                             dev 8
                                 |
                                 v
.. |  dev 3  |  dev 3  | -->           --> |  dev 5  |  dev 7  |
.. |  BAR 0  |  BAR 1  | -->           --> |  BAR 0  |  BAR 0  |

 3)

.. |  dev 3  |  dev 3  |  dev 8  |  dev 8  |  dev 5  |  dev 7  |
.. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |

Thus, prior reservation of memory regions by BIOS/bootloader/firmware
is not required anymore for the PCIe hotplug.

The PCI_MOVABLE_BARS flag is set by the platform is this feature is
supported and tested, but can be overridden by the following command
line option:
    pcie_movable_bars={ off | force }

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 .../admin-guide/kernel-parameters.txt         |  7 ++++++
 drivers/pci/pci.c                             | 24 +++++++++++++++++++
 include/linux/pci.h                           |  2 ++
 3 files changed, 33 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2b8ee90bb644..d40eaf993f80 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3417,6 +3417,13 @@
 		nomsi	Do not use MSI for native PCIe PME signaling (this makes
 			all PCIe root ports use INTx for all services).
 
+	pcie_movable_bars=[PCIE]
+			Override the movable BARs support detection:
+		off
+			Disable even if supported by the platform
+		force
+			Enable even if not explicitly declared as supported
+
 	pcmv=		[HW,PCMCIA] BadgePAD 4
 
 	pd_ignore_unused
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 69898fe5255e..4dac49a887ec 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
 }
 __setup("pcie_port_pm=", pcie_port_pm_setup);
 
+static bool pcie_movable_bars_off;
+static bool pcie_movable_bars_force;
+static int __init pcie_movable_bars_setup(char *str)
+{
+	if (!strcmp(str, "off"))
+		pcie_movable_bars_off = true;
+	else if (!strcmp(str, "force"))
+		pcie_movable_bars_force = true;
+	return 1;
+}
+__setup("pcie_movable_bars=", pcie_movable_bars_setup);
+
+bool pci_movable_bars_enabled(void)
+{
+	if (pcie_movable_bars_off)
+		return false;
+
+	if (pcie_movable_bars_force)
+		return true;
+
+	return pci_has_flag(PCI_MOVABLE_BARS);
+}
+EXPORT_SYMBOL(pci_movable_bars_enabled);
+
 /* Time to wait after a reset for device to become responsive */
 #define PCIE_RESET_READY_POLL_MS 60000
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index cb2760a31fe2..cbe661aff9f5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -866,6 +866,7 @@ enum {
 	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
 	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
 	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
+	PCI_MOVABLE_BARS	= 0x00000080,	/* Runtime BAR reassign after hotplug */
 };
 
 /* These external functions are only available when PCI support is enabled */
@@ -1345,6 +1346,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 					 unsigned long type);
+bool pci_movable_bars_enabled(void);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 05/21] PCI: hotplug: Add a flag for the movable BARs feature
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

If a new PCIe device has been hot-plugged between the two active ones
without big enough gap between their BARs, these BARs should be moved
if their drivers support this feature. The drivers should be notified
and paused during the procedure:

1)                 dev 8 (new)
                       |
                       v
.. |  dev 3  |  dev 3  |  dev 5  |  dev 7  |
.. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |

2)                             dev 8
                                 |
                                 v
.. |  dev 3  |  dev 3  | -->           --> |  dev 5  |  dev 7  |
.. |  BAR 0  |  BAR 1  | -->           --> |  BAR 0  |  BAR 0  |

 3)

.. |  dev 3  |  dev 3  |  dev 8  |  dev 8  |  dev 5  |  dev 7  |
.. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |

Thus, prior reservation of memory regions by BIOS/bootloader/firmware
is not required anymore for the PCIe hotplug.

The PCI_MOVABLE_BARS flag is set by the platform is this feature is
supported and tested, but can be overridden by the following command
line option:
    pcie_movable_bars={ off | force }

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 .../admin-guide/kernel-parameters.txt         |  7 ++++++
 drivers/pci/pci.c                             | 24 +++++++++++++++++++
 include/linux/pci.h                           |  2 ++
 3 files changed, 33 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2b8ee90bb644..d40eaf993f80 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3417,6 +3417,13 @@
 		nomsi	Do not use MSI for native PCIe PME signaling (this makes
 			all PCIe root ports use INTx for all services).
 
+	pcie_movable_bars=[PCIE]
+			Override the movable BARs support detection:
+		off
+			Disable even if supported by the platform
+		force
+			Enable even if not explicitly declared as supported
+
 	pcmv=		[HW,PCMCIA] BadgePAD 4
 
 	pd_ignore_unused
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 69898fe5255e..4dac49a887ec 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
 }
 __setup("pcie_port_pm=", pcie_port_pm_setup);
 
+static bool pcie_movable_bars_off;
+static bool pcie_movable_bars_force;
+static int __init pcie_movable_bars_setup(char *str)
+{
+	if (!strcmp(str, "off"))
+		pcie_movable_bars_off = true;
+	else if (!strcmp(str, "force"))
+		pcie_movable_bars_force = true;
+	return 1;
+}
+__setup("pcie_movable_bars=", pcie_movable_bars_setup);
+
+bool pci_movable_bars_enabled(void)
+{
+	if (pcie_movable_bars_off)
+		return false;
+
+	if (pcie_movable_bars_force)
+		return true;
+
+	return pci_has_flag(PCI_MOVABLE_BARS);
+}
+EXPORT_SYMBOL(pci_movable_bars_enabled);
+
 /* Time to wait after a reset for device to become responsive */
 #define PCIE_RESET_READY_POLL_MS 60000
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index cb2760a31fe2..cbe661aff9f5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -866,6 +866,7 @@ enum {
 	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
 	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
 	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
+	PCI_MOVABLE_BARS	= 0x00000080,	/* Runtime BAR reassign after hotplug */
 };
 
 /* These external functions are only available when PCI support is enabled */
@@ -1345,6 +1346,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 					 unsigned long type);
+bool pci_movable_bars_enabled(void);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 06/21] PCI: Pause the devices with movable BARs during rescan
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Drivers indicate their support of movable BARs by implementing the
new rescan_prepare() and rescan_done() hooks in the struct pci_driver.

All device's activity must be stopped during a rescan, and iounmap()
+ioremap() must be applied to every used BAR.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 51 +++++++++++++++++++++++++++++++++++++++++++--
 include/linux/pci.h |  2 ++
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 977a127ce791..88350dd56344 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3248,6 +3248,38 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 	return max;
 }
 
+static void pci_bus_rescan_prepare(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (child) {
+			pci_bus_rescan_prepare(child);
+		} else if (dev->driver &&
+			   dev->driver->rescan_prepare) {
+			dev->driver->rescan_prepare(dev);
+		}
+	}
+}
+
+static void pci_bus_rescan_done(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (child) {
+			pci_bus_rescan_done(child);
+		} else if (dev->driver &&
+			   dev->driver->rescan_done) {
+			dev->driver->rescan_done(dev);
+		}
+	}
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3261,8 +3293,23 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
 
-	max = pci_scan_child_bus(bus);
-	pci_assign_unassigned_bus_resources(bus);
+	if (pci_movable_bars_enabled()) {
+		struct pci_bus *root = bus;
+
+		while (!pci_is_root_bus(root))
+			root = root->parent;
+
+		pci_bus_rescan_prepare(root);
+
+		max = pci_scan_child_bus(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		pci_bus_rescan_done(root);
+	} else {
+		max = pci_scan_child_bus(bus);
+		pci_assign_unassigned_bus_resources(bus);
+	}
+
 	pci_bus_add_devices(bus);
 
 	return max;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index cbe661aff9f5..3d52f5538282 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -780,6 +780,8 @@ struct pci_driver {
 	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
 	void (*shutdown)(struct pci_dev *dev);
 	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
+	void (*rescan_prepare)(struct pci_dev *dev);
+	void (*rescan_done)(struct pci_dev *dev);
 	const struct pci_error_handlers *err_handler;
 	const struct attribute_group **groups;
 	struct device_driver	driver;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 06/21] PCI: Pause the devices with movable BARs during rescan
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Drivers indicate their support of movable BARs by implementing the
new rescan_prepare() and rescan_done() hooks in the struct pci_driver.

All device's activity must be stopped during a rescan, and iounmap()
+ioremap() must be applied to every used BAR.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 51 +++++++++++++++++++++++++++++++++++++++++++--
 include/linux/pci.h |  2 ++
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 977a127ce791..88350dd56344 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3248,6 +3248,38 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 	return max;
 }
 
+static void pci_bus_rescan_prepare(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (child) {
+			pci_bus_rescan_prepare(child);
+		} else if (dev->driver &&
+			   dev->driver->rescan_prepare) {
+			dev->driver->rescan_prepare(dev);
+		}
+	}
+}
+
+static void pci_bus_rescan_done(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (child) {
+			pci_bus_rescan_done(child);
+		} else if (dev->driver &&
+			   dev->driver->rescan_done) {
+			dev->driver->rescan_done(dev);
+		}
+	}
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3261,8 +3293,23 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
 
-	max = pci_scan_child_bus(bus);
-	pci_assign_unassigned_bus_resources(bus);
+	if (pci_movable_bars_enabled()) {
+		struct pci_bus *root = bus;
+
+		while (!pci_is_root_bus(root))
+			root = root->parent;
+
+		pci_bus_rescan_prepare(root);
+
+		max = pci_scan_child_bus(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		pci_bus_rescan_done(root);
+	} else {
+		max = pci_scan_child_bus(bus);
+		pci_assign_unassigned_bus_resources(bus);
+	}
+
 	pci_bus_add_devices(bus);
 
 	return max;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index cbe661aff9f5..3d52f5538282 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -780,6 +780,8 @@ struct pci_driver {
 	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
 	void (*shutdown)(struct pci_dev *dev);
 	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
+	void (*rescan_prepare)(struct pci_dev *dev);
+	void (*rescan_done)(struct pci_dev *dev);
 	const struct pci_error_handlers *err_handler;
 	const struct attribute_group **groups;
 	struct device_driver	driver;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 07/21] PCI: Wake up bridges during rescan when movable BARs enabled
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Use the PM runtime methods to wake up the bridges before accessing
their config space.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 88350dd56344..dc935f82a595 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3252,6 +3252,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
 
+	pm_runtime_get_sync(&bus->dev);
+
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child = dev->subordinate;
 
@@ -3278,6 +3280,8 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
 			dev->driver->rescan_done(dev);
 		}
 	}
+
+	pm_runtime_put(&bus->dev);
 }
 
 /**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 07/21] PCI: Wake up bridges during rescan when movable BARs enabled
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Use the PM runtime methods to wake up the bridges before accessing
their config space.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 88350dd56344..dc935f82a595 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3252,6 +3252,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
 
+	pm_runtime_get_sync(&bus->dev);
+
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child = dev->subordinate;
 
@@ -3278,6 +3280,8 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
 			dev->driver->rescan_done(dev);
 		}
 	}
+
+	pm_runtime_put(&bus->dev);
 }
 
 /**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Hotplugged devices can affect the existing ones by moving their BARs.
PCI subsystem will inform the NVME driver about this by invoking
reset_prepare()+reset_done(), then iounmap()+ioremap() must be called.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/nvme/host/pci.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 92bad1c810ac..ccea3033a67a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -106,6 +106,7 @@ struct nvme_dev {
 	unsigned int num_vecs;
 	int q_depth;
 	u32 db_stride;
+	resource_size_t current_phys_bar;
 	void __iomem *bar;
 	unsigned long bar_mapped_size;
 	struct work_struct remove_work;
@@ -1672,13 +1673,16 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
-	if (size <= dev->bar_mapped_size)
+	if (dev->bar &&
+	    dev->current_phys_bar == pci_resource_start(pdev, 0) &&
+	    size <= dev->bar_mapped_size)
 		return 0;
 	if (size > pci_resource_len(pdev, 0))
 		return -ENOMEM;
 	if (dev->bar)
 		iounmap(dev->bar);
-	dev->bar = ioremap(pci_resource_start(pdev, 0), size);
+	dev->current_phys_bar = pci_resource_start(pdev, 0);
+	dev->bar = ioremap(dev->current_phys_bar, size);
 	if (!dev->bar) {
 		dev->bar_mapped_size = 0;
 		return -ENOMEM;
@@ -2504,6 +2508,8 @@ static void nvme_reset_work(struct work_struct *work)
 	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
 		goto out;
 
+	nvme_remap_bar(dev, db_bar_size(dev, 0));
+
 	/*
 	 * If we're called to reset a live controller first shut it down before
 	 * moving on.
@@ -2910,6 +2916,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
 	flush_work(&dev->ctrl.reset_work);
 }
 
+void nvme_rescan_prepare(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_disable(dev, false);
+	nvme_dev_unmap(dev);
+	dev->bar = NULL;
+}
+
+void nvme_rescan_done(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_map(dev);
+	nvme_reset_ctrl_sync(&dev->ctrl);
+}
+
 static const struct pci_error_handlers nvme_err_handler = {
 	.error_detected	= nvme_error_detected,
 	.slot_reset	= nvme_slot_reset,
@@ -2974,6 +2997,8 @@ static struct pci_driver nvme_driver = {
 	},
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
+	.rescan_prepare	= nvme_rescan_prepare,
+	.rescan_done	= nvme_rescan_done,
 };
 
 static int __init nvme_init(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Hotplugged devices can affect the existing ones by moving their BARs.
PCI subsystem will inform the NVME driver about this by invoking
reset_prepare()+reset_done(), then iounmap()+ioremap() must be called.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/nvme/host/pci.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 92bad1c810ac..ccea3033a67a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -106,6 +106,7 @@ struct nvme_dev {
 	unsigned int num_vecs;
 	int q_depth;
 	u32 db_stride;
+	resource_size_t current_phys_bar;
 	void __iomem *bar;
 	unsigned long bar_mapped_size;
 	struct work_struct remove_work;
@@ -1672,13 +1673,16 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
-	if (size <= dev->bar_mapped_size)
+	if (dev->bar &&
+	    dev->current_phys_bar == pci_resource_start(pdev, 0) &&
+	    size <= dev->bar_mapped_size)
 		return 0;
 	if (size > pci_resource_len(pdev, 0))
 		return -ENOMEM;
 	if (dev->bar)
 		iounmap(dev->bar);
-	dev->bar = ioremap(pci_resource_start(pdev, 0), size);
+	dev->current_phys_bar = pci_resource_start(pdev, 0);
+	dev->bar = ioremap(dev->current_phys_bar, size);
 	if (!dev->bar) {
 		dev->bar_mapped_size = 0;
 		return -ENOMEM;
@@ -2504,6 +2508,8 @@ static void nvme_reset_work(struct work_struct *work)
 	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
 		goto out;
 
+	nvme_remap_bar(dev, db_bar_size(dev, 0));
+
 	/*
 	 * If we're called to reset a live controller first shut it down before
 	 * moving on.
@@ -2910,6 +2916,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
 	flush_work(&dev->ctrl.reset_work);
 }
 
+void nvme_rescan_prepare(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_disable(dev, false);
+	nvme_dev_unmap(dev);
+	dev->bar = NULL;
+}
+
+void nvme_rescan_done(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_map(dev);
+	nvme_reset_ctrl_sync(&dev->ctrl);
+}
+
 static const struct pci_error_handlers nvme_err_handler = {
 	.error_detected	= nvme_error_detected,
 	.slot_reset	= nvme_slot_reset,
@@ -2974,6 +2997,8 @@ static struct pci_driver nvme_driver = {
 	},
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
+	.rescan_prepare	= nvme_rescan_prepare,
+	.rescan_done	= nvme_rescan_done,
 };
 
 static int __init nvme_init(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 09/21] PCI: Mark immovable BARs with PCI_FIXED
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

If a PCIe device driver doesn't yet have support for movable BARs,
mark device's BARs with IORESOURCE_PCI_FIXED.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index dc935f82a595..1cf6ec960236 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3262,6 +3262,21 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 		} else if (dev->driver &&
 			   dev->driver->rescan_prepare) {
 			dev->driver->rescan_prepare(dev);
+		} else if (dev->driver || ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)) {
+			int i;
+
+			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+				struct resource *r = &dev->resource[i];
+
+				if (!r->flags || !r->parent ||
+				    (r->flags & IORESOURCE_UNSET) ||
+				    (r->flags & IORESOURCE_PCI_FIXED))
+					continue;
+
+				r->flags |= IORESOURCE_PCI_FIXED;
+				pci_warn(dev, "%s: no support for movable BARs, mark BAR %d (%pR) as fixed\n",
+					 __func__, i, r);
+			}
 		}
 	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 09/21] PCI: Mark immovable BARs with PCI_FIXED
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

If a PCIe device driver doesn't yet have support for movable BARs,
mark device's BARs with IORESOURCE_PCI_FIXED.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index dc935f82a595..1cf6ec960236 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3262,6 +3262,21 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 		} else if (dev->driver &&
 			   dev->driver->rescan_prepare) {
 			dev->driver->rescan_prepare(dev);
+		} else if (dev->driver || ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)) {
+			int i;
+
+			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+				struct resource *r = &dev->resource[i];
+
+				if (!r->flags || !r->parent ||
+				    (r->flags & IORESOURCE_UNSET) ||
+				    (r->flags & IORESOURCE_PCI_FIXED))
+					continue;
+
+				r->flags |= IORESOURCE_PCI_FIXED;
+				pci_warn(dev, "%s: no support for movable BARs, mark BAR %d (%pR) as fixed\n",
+					 __func__, i, r);
+			}
 		}
 	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 10/21] PCI: Fix assigning of fixed prefetchable resources
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Allow matching them to non-prefetchable windows, as it is done for movable
resources.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 3644feb13179..be7d4e6d7b65 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1301,15 +1301,20 @@ static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 {
 	int i;
 	struct resource *parent_r;
-	unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
-			     IORESOURCE_PREFETCH;
+	unsigned long mask = IORESOURCE_TYPE_BITS;
 
 	pci_bus_for_each_resource(b, parent_r, i) {
 		if (!parent_r)
 			continue;
 
-		if ((r->flags & mask) == (parent_r->flags & mask) &&
-		    resource_contains(parent_r, r))
+		if ((r->flags & mask) != (parent_r->flags & mask))
+			continue;
+
+		if (parent_r->flags & IORESOURCE_PREFETCH &&
+		    !(r->flags & IORESOURCE_PREFETCH))
+			continue;
+
+		if (resource_contains(parent_r, r))
 			request_resource(parent_r, r);
 	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 10/21] PCI: Fix assigning of fixed prefetchable resources
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Allow matching them to non-prefetchable windows, as it is done for movable
resources.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 3644feb13179..be7d4e6d7b65 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1301,15 +1301,20 @@ static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 {
 	int i;
 	struct resource *parent_r;
-	unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
-			     IORESOURCE_PREFETCH;
+	unsigned long mask = IORESOURCE_TYPE_BITS;
 
 	pci_bus_for_each_resource(b, parent_r, i) {
 		if (!parent_r)
 			continue;
 
-		if ((r->flags & mask) == (parent_r->flags & mask) &&
-		    resource_contains(parent_r, r))
+		if ((r->flags & mask) != (parent_r->flags & mask))
+			continue;
+
+		if (parent_r->flags & IORESOURCE_PREFETCH &&
+		    !(r->flags & IORESOURCE_PREFETCH))
+			continue;
+
+		if (resource_contains(parent_r, r))
 			request_resource(parent_r, r);
 	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 11/21] PCI: Release and reassign the root bridge resources during rescan
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

When the movable BARs feature is enabled, don't rely on the memory gaps
reserved by the BIOS/bootloader/firmware, but instead rearrange the BARs
and bridge windows starting from the root.

Endpoint device's BARs, after being released, are resorted and written
back by the pci_assign_unassigned_root_bus_resources().

The last step of writing the recalculated windows to the bridges is done
by the new pci_setup_bridges() function.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  1 +
 drivers/pci/probe.c     | 22 ++++++++++++++++++++++
 drivers/pci/setup-bus.c | 11 ++++++++++-
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 224d88634115..e06e8692a7b1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -248,6 +248,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 				struct list_head *realloc_head,
 				struct list_head *fail_head);
 bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
+void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
 
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1cf6ec960236..692752c71f71 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3299,6 +3299,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
 	pm_runtime_put(&bus->dev);
 }
 
+static void pci_setup_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child;
+
+		if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
+			continue;
+
+		child = dev->subordinate;
+		if (child)
+			pci_setup_bridges(child);
+	}
+
+	if (bus->self)
+		pci_setup_bridge(bus);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3321,8 +3340,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+
+		pci_bus_release_root_bridge_resources(root);
 		pci_assign_unassigned_root_bus_resources(root);
 
+		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
 	} else {
 		max = pci_scan_child_bus(bus);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index be7d4e6d7b65..36a1907d9509 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1584,7 +1584,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		pci_printk(KERN_DEBUG, dev, "resource %d %pR released\n",
 					PCI_BRIDGE_RESOURCES + idx, r);
 		/* keep the old size */
-		r->end = resource_size(r) - 1;
+		r->end = pci_movable_bars_enabled() ? 0 : (resource_size(r) - 1);
 		r->start = 0;
 		r->flags = 0;
 
@@ -1637,6 +1637,15 @@ static void pci_bus_release_bridge_resources(struct pci_bus *bus,
 		pci_bridge_release_resources(bus, type);
 }
 
+void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
+{
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus,
+					 IORESOURCE_MEM_64 | IORESOURCE_PREFETCH,
+					 whole_subtree);
+}
+
 static void pci_bus_dump_res(struct pci_bus *bus)
 {
 	struct resource *res;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 11/21] PCI: Release and reassign the root bridge resources during rescan
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

When the movable BARs feature is enabled, don't rely on the memory gaps
reserved by the BIOS/bootloader/firmware, but instead rearrange the BARs
and bridge windows starting from the root.

Endpoint device's BARs, after being released, are resorted and written
back by the pci_assign_unassigned_root_bus_resources().

The last step of writing the recalculated windows to the bridges is done
by the new pci_setup_bridges() function.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  1 +
 drivers/pci/probe.c     | 22 ++++++++++++++++++++++
 drivers/pci/setup-bus.c | 11 ++++++++++-
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 224d88634115..e06e8692a7b1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -248,6 +248,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 				struct list_head *realloc_head,
 				struct list_head *fail_head);
 bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
+void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
 
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1cf6ec960236..692752c71f71 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3299,6 +3299,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
 	pm_runtime_put(&bus->dev);
 }
 
+static void pci_setup_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child;
+
+		if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
+			continue;
+
+		child = dev->subordinate;
+		if (child)
+			pci_setup_bridges(child);
+	}
+
+	if (bus->self)
+		pci_setup_bridge(bus);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3321,8 +3340,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+
+		pci_bus_release_root_bridge_resources(root);
 		pci_assign_unassigned_root_bus_resources(root);
 
+		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
 	} else {
 		max = pci_scan_child_bus(bus);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index be7d4e6d7b65..36a1907d9509 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1584,7 +1584,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		pci_printk(KERN_DEBUG, dev, "resource %d %pR released\n",
 					PCI_BRIDGE_RESOURCES + idx, r);
 		/* keep the old size */
-		r->end = resource_size(r) - 1;
+		r->end = pci_movable_bars_enabled() ? 0 : (resource_size(r) - 1);
 		r->start = 0;
 		r->flags = 0;
 
@@ -1637,6 +1637,15 @@ static void pci_bus_release_bridge_resources(struct pci_bus *bus,
 		pci_bridge_release_resources(bus, type);
 }
 
+void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
+{
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus,
+					 IORESOURCE_MEM_64 | IORESOURCE_PREFETCH,
+					 whole_subtree);
+}
+
 static void pci_bus_dump_res(struct pci_bus *bus)
 {
 	struct resource *res;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 12/21] PCI: Don't allow hotplugged devices to steal resources
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

When movable BARs are enabled, the PCI subsystem at first releases
all the bridge windows and then performs an attempt to assign new
requested resources and re-assign the existing ones.

If a hotplugged device gets its resources first, there could be no
space left to re-assign resources of already working devices, which
is unacceptable. If this happens, this patch marks one of the new
devices with the new introduced flag PCI_DEV_IGNORE and retries the
resource assignment.

This patch adds a new res_mask bitmask to the struct pci_dev for
storing the indices of assigned resources.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/bus.c       |   5 ++
 drivers/pci/pci.h       |  11 +++++
 drivers/pci/probe.c     | 100 +++++++++++++++++++++++++++++++++++++++-
 drivers/pci/setup-bus.c |  15 ++++++
 include/linux/pci.h     |   1 +
 5 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 5cb40b2518f9..a9784144d6f2 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -311,6 +311,11 @@ void pci_bus_add_device(struct pci_dev *dev)
 {
 	int retval;
 
+	if (pci_dev_is_ignored(dev)) {
+		pci_warn(dev, "%s: don't enable the ignored device\n", __func__);
+		return;
+	}
+
 	/*
 	 * Can not put in pci_device_add yet because resources
 	 * are not assigned yet for some devices.
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e06e8692a7b1..56b905068ac5 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -366,6 +366,7 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
+#define PCI_DEV_IGNORE 1
 
 static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
 {
@@ -377,6 +378,16 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
 	return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
 }
 
+static inline void pci_dev_ignore(struct pci_dev *dev, bool ignore)
+{
+	assign_bit(PCI_DEV_IGNORE, &dev->priv_flags, ignore);
+}
+
+static inline bool pci_dev_is_ignored(const struct pci_dev *dev)
+{
+	return test_bit(PCI_DEV_IGNORE, &dev->priv_flags);
+}
+
 #ifdef CONFIG_PCIEAER
 #include <linux/aer.h>
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 692752c71f71..62f4058a001f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3248,6 +3248,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 	return max;
 }
 
+static unsigned int pci_dev_res_mask(struct pci_dev *dev)
+{
+	unsigned int res_mask = 0;
+	int i;
+
+	for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
+		struct resource *r = &dev->resource[i];
+
+		if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+			continue;
+
+		res_mask |= (1 << i);
+	}
+
+	return res_mask;
+}
+
 static void pci_bus_rescan_prepare(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3257,6 +3274,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child = dev->subordinate;
 
+		dev->res_mask = pci_dev_res_mask(dev);
+
 		if (child) {
 			pci_bus_rescan_prepare(child);
 		} else if (dev->driver &&
@@ -3318,6 +3337,84 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (!bus)
+		return NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child_bus = dev->subordinate;
+
+		if (!pci_dev_is_added(dev) && !pci_dev_is_ignored(dev))
+			return dev;
+
+		if (child_bus) {
+			struct pci_dev *next_new_dev;
+
+			next_new_dev = pci_find_next_new_device(child_bus);
+			if (next_new_dev)
+				return next_new_dev;
+		}
+	}
+
+	return NULL;
+}
+
+static bool pci_bus_validate_resources(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	bool ret = true;
+
+	if (!bus)
+		return false;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+		unsigned int res_mask = pci_dev_res_mask(dev);
+
+		if (pci_dev_is_ignored(dev))
+			continue;
+
+		if (dev->res_mask & ~res_mask) {
+			pci_err(dev, "%s: Non-re-enabled resources found: 0x%x -> 0x%x\n",
+				__func__, dev->res_mask, res_mask);
+			ret = false;
+		}
+
+		if (child && !pci_bus_validate_resources(child))
+			ret = false;
+	}
+
+	return ret;
+}
+
+static void pci_reassign_root_bus_resources(struct pci_bus *root)
+{
+	do {
+		struct pci_dev *next_new_dev;
+
+		pci_bus_release_root_bridge_resources(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		if (pci_bus_validate_resources(root))
+			break;
+
+		next_new_dev = pci_find_next_new_device(root);
+		if (!next_new_dev) {
+			dev_err(&root->dev, "%s: failed to re-assign resources even after ignoring all the hotplugged devices\n",
+				__func__);
+			break;
+		}
+
+		dev_warn(&root->dev, "%s: failed to re-assign resources, disable the next hotplugged device %s and retry\n",
+			 __func__, dev_name(&next_new_dev->dev));
+
+		pci_dev_ignore(next_new_dev, true);
+	} while (true);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3341,8 +3438,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 
 		max = pci_scan_child_bus(root);
 
-		pci_bus_release_root_bridge_resources(root);
-		pci_assign_unassigned_root_bus_resources(root);
+		pci_reassign_root_bus_resources(root);
 
 		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 36a1907d9509..551108f48df7 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -131,6 +131,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
 	int i;
 
+	if (pci_dev_is_ignored(dev))
+		return;
+
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *r;
 		struct pci_dev_resource *dev_res, *tmp;
@@ -181,6 +184,9 @@ static void __dev_sort_resources(struct pci_dev *dev,
 {
 	u16 class = dev->class >> 8;
 
+	if (pci_dev_is_ignored(dev))
+		return;
+
 	/* Don't touch classless devices or host bridges or ioapics.  */
 	if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
 		return;
@@ -284,6 +290,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
 	int idx;
 
 	list_for_each_entry(dev_res, head, list) {
+		if (pci_dev_is_ignored(dev_res->dev))
+			continue;
+
 		res = dev_res->res;
 		idx = res - &dev_res->dev->resource[0];
 		if (resource_size(res) &&
@@ -991,6 +1000,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
+		if (pci_dev_is_ignored(dev))
+			continue;
+
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
@@ -1353,6 +1365,9 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 	pbus_assign_resources_sorted(bus, realloc_head, fail_head);
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
+		if (pci_dev_is_ignored(dev))
+			continue;
+
 		pdev_assign_fixed_resources(dev);
 
 		b = dev->subordinate;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3d52f5538282..26aa59cb6220 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -369,6 +369,7 @@ struct pci_dev {
 	 */
 	unsigned int	irq;
 	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
+	unsigned int	res_mask;		/* Bitmask of assigned resources */
 
 	bool		match_driver;		/* Skip attaching driver */
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 12/21] PCI: Don't allow hotplugged devices to steal resources
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

When movable BARs are enabled, the PCI subsystem at first releases
all the bridge windows and then performs an attempt to assign new
requested resources and re-assign the existing ones.

If a hotplugged device gets its resources first, there could be no
space left to re-assign resources of already working devices, which
is unacceptable. If this happens, this patch marks one of the new
devices with the new introduced flag PCI_DEV_IGNORE and retries the
resource assignment.

This patch adds a new res_mask bitmask to the struct pci_dev for
storing the indices of assigned resources.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/bus.c       |   5 ++
 drivers/pci/pci.h       |  11 +++++
 drivers/pci/probe.c     | 100 +++++++++++++++++++++++++++++++++++++++-
 drivers/pci/setup-bus.c |  15 ++++++
 include/linux/pci.h     |   1 +
 5 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 5cb40b2518f9..a9784144d6f2 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -311,6 +311,11 @@ void pci_bus_add_device(struct pci_dev *dev)
 {
 	int retval;
 
+	if (pci_dev_is_ignored(dev)) {
+		pci_warn(dev, "%s: don't enable the ignored device\n", __func__);
+		return;
+	}
+
 	/*
 	 * Can not put in pci_device_add yet because resources
 	 * are not assigned yet for some devices.
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e06e8692a7b1..56b905068ac5 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -366,6 +366,7 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
+#define PCI_DEV_IGNORE 1
 
 static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
 {
@@ -377,6 +378,16 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
 	return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
 }
 
+static inline void pci_dev_ignore(struct pci_dev *dev, bool ignore)
+{
+	assign_bit(PCI_DEV_IGNORE, &dev->priv_flags, ignore);
+}
+
+static inline bool pci_dev_is_ignored(const struct pci_dev *dev)
+{
+	return test_bit(PCI_DEV_IGNORE, &dev->priv_flags);
+}
+
 #ifdef CONFIG_PCIEAER
 #include <linux/aer.h>
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 692752c71f71..62f4058a001f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3248,6 +3248,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 	return max;
 }
 
+static unsigned int pci_dev_res_mask(struct pci_dev *dev)
+{
+	unsigned int res_mask = 0;
+	int i;
+
+	for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
+		struct resource *r = &dev->resource[i];
+
+		if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+			continue;
+
+		res_mask |= (1 << i);
+	}
+
+	return res_mask;
+}
+
 static void pci_bus_rescan_prepare(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3257,6 +3274,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child = dev->subordinate;
 
+		dev->res_mask = pci_dev_res_mask(dev);
+
 		if (child) {
 			pci_bus_rescan_prepare(child);
 		} else if (dev->driver &&
@@ -3318,6 +3337,84 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (!bus)
+		return NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child_bus = dev->subordinate;
+
+		if (!pci_dev_is_added(dev) && !pci_dev_is_ignored(dev))
+			return dev;
+
+		if (child_bus) {
+			struct pci_dev *next_new_dev;
+
+			next_new_dev = pci_find_next_new_device(child_bus);
+			if (next_new_dev)
+				return next_new_dev;
+		}
+	}
+
+	return NULL;
+}
+
+static bool pci_bus_validate_resources(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	bool ret = true;
+
+	if (!bus)
+		return false;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+		unsigned int res_mask = pci_dev_res_mask(dev);
+
+		if (pci_dev_is_ignored(dev))
+			continue;
+
+		if (dev->res_mask & ~res_mask) {
+			pci_err(dev, "%s: Non-re-enabled resources found: 0x%x -> 0x%x\n",
+				__func__, dev->res_mask, res_mask);
+			ret = false;
+		}
+
+		if (child && !pci_bus_validate_resources(child))
+			ret = false;
+	}
+
+	return ret;
+}
+
+static void pci_reassign_root_bus_resources(struct pci_bus *root)
+{
+	do {
+		struct pci_dev *next_new_dev;
+
+		pci_bus_release_root_bridge_resources(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		if (pci_bus_validate_resources(root))
+			break;
+
+		next_new_dev = pci_find_next_new_device(root);
+		if (!next_new_dev) {
+			dev_err(&root->dev, "%s: failed to re-assign resources even after ignoring all the hotplugged devices\n",
+				__func__);
+			break;
+		}
+
+		dev_warn(&root->dev, "%s: failed to re-assign resources, disable the next hotplugged device %s and retry\n",
+			 __func__, dev_name(&next_new_dev->dev));
+
+		pci_dev_ignore(next_new_dev, true);
+	} while (true);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3341,8 +3438,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 
 		max = pci_scan_child_bus(root);
 
-		pci_bus_release_root_bridge_resources(root);
-		pci_assign_unassigned_root_bus_resources(root);
+		pci_reassign_root_bus_resources(root);
 
 		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 36a1907d9509..551108f48df7 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -131,6 +131,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
 	int i;
 
+	if (pci_dev_is_ignored(dev))
+		return;
+
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *r;
 		struct pci_dev_resource *dev_res, *tmp;
@@ -181,6 +184,9 @@ static void __dev_sort_resources(struct pci_dev *dev,
 {
 	u16 class = dev->class >> 8;
 
+	if (pci_dev_is_ignored(dev))
+		return;
+
 	/* Don't touch classless devices or host bridges or ioapics.  */
 	if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
 		return;
@@ -284,6 +290,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
 	int idx;
 
 	list_for_each_entry(dev_res, head, list) {
+		if (pci_dev_is_ignored(dev_res->dev))
+			continue;
+
 		res = dev_res->res;
 		idx = res - &dev_res->dev->resource[0];
 		if (resource_size(res) &&
@@ -991,6 +1000,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
+		if (pci_dev_is_ignored(dev))
+			continue;
+
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
@@ -1353,6 +1365,9 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 	pbus_assign_resources_sorted(bus, realloc_head, fail_head);
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
+		if (pci_dev_is_ignored(dev))
+			continue;
+
 		pdev_assign_fixed_resources(dev);
 
 		b = dev->subordinate;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3d52f5538282..26aa59cb6220 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -369,6 +369,7 @@ struct pci_dev {
 	 */
 	unsigned int	irq;
 	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
+	unsigned int	res_mask;		/* Bitmask of assigned resources */
 
 	bool		match_driver;		/* Skip attaching driver */
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 13/21] PCI: Include fixed BARs into the bus size calculating
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

The only difference between the fixed and movable BARs is an offset
preservation during the release+reassign procedure on PCIe rescan.

When fixed BARs are included into the result of pbus_size_mem(), these
BARs can be restricted: assign them to direct parents only.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 551108f48df7..9d93f2b32bf1 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1007,12 +1007,20 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
 
-			if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
+			if (r->parent ||
 			    ((r->flags & mask) != type &&
 			     (r->flags & mask) != type2 &&
 			     (r->flags & mask) != type3))
 				continue;
 			r_size = resource_size(r);
+
+			if (r->flags & IORESOURCE_PCI_FIXED) {
+				if (pci_movable_bars_enabled())
+					size += r_size;
+
+				continue;
+			}
+
 #ifdef CONFIG_PCI_IOV
 			/* put SRIOV requested res to the optional list */
 			if (realloc_head && i >= PCI_IOV_RESOURCES &&
@@ -1351,6 +1359,8 @@ static void pdev_assign_fixed_resources(struct pci_dev *dev)
 		while (b && !r->parent) {
 			assign_fixed_resource_on_bus(b, r);
 			b = b->parent;
+			if (!r->parent && pci_movable_bars_enabled())
+				break;
 		}
 	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 13/21] PCI: Include fixed BARs into the bus size calculating
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

The only difference between the fixed and movable BARs is an offset
preservation during the release+reassign procedure on PCIe rescan.

When fixed BARs are included into the result of pbus_size_mem(), these
BARs can be restricted: assign them to direct parents only.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 551108f48df7..9d93f2b32bf1 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1007,12 +1007,20 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
 
-			if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
+			if (r->parent ||
 			    ((r->flags & mask) != type &&
 			     (r->flags & mask) != type2 &&
 			     (r->flags & mask) != type3))
 				continue;
 			r_size = resource_size(r);
+
+			if (r->flags & IORESOURCE_PCI_FIXED) {
+				if (pci_movable_bars_enabled())
+					size += r_size;
+
+				continue;
+			}
+
 #ifdef CONFIG_PCI_IOV
 			/* put SRIOV requested res to the optional list */
 			if (realloc_head && i >= PCI_IOV_RESOURCES &&
@@ -1351,6 +1359,8 @@ static void pdev_assign_fixed_resources(struct pci_dev *dev)
 		while (b && !r->parent) {
 			assign_fixed_resource_on_bus(b, r);
 			b = b->parent;
+			if (!r->parent && pci_movable_bars_enabled())
+				break;
 		}
 	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 14/21] PCI: Don't reserve memory for hotplug when enabled movable BARs
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

pbus_size_mem() returns a precise amount of memory required to fit
all the requested BARs and windows of children bridges.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 9d93f2b32bf1..f9d605cd1725 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1229,7 +1229,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 
 	case PCI_HEADER_TYPE_BRIDGE:
 		pci_bridge_check_ranges(bus);
-		if (bus->self->is_hotplug_bridge) {
+		if (bus->self->is_hotplug_bridge && !pci_movable_bars_enabled()) {
 			additional_io_size  = pci_hotplug_io_size;
 			additional_mem_size = pci_hotplug_mem_size;
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 14/21] PCI: Don't reserve memory for hotplug when enabled movable BARs
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

pbus_size_mem() returns a precise amount of memory required to fit
all the requested BARs and windows of children bridges.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 9d93f2b32bf1..f9d605cd1725 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1229,7 +1229,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 
 	case PCI_HEADER_TYPE_BRIDGE:
 		pci_bridge_check_ranges(bus);
-		if (bus->self->is_hotplug_bridge) {
+		if (bus->self->is_hotplug_bridge && !pci_movable_bars_enabled()) {
 			additional_io_size  = pci_hotplug_io_size;
 			additional_mem_size = pci_hotplug_mem_size;
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 15/21] PCI: Allow the failed resources to be reassigned later
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Don't lose the size of the requested EP's BAR if it can't be fit
in a current trial, so this can be retried.

But a failed bridge window must be dropped and recalculated in the
next trial.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c |  3 ++-
 drivers/pci/setup-res.c | 12 ++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f9d605cd1725..c1559a4a8564 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -309,7 +309,8 @@ static void assign_requested_resources_sorted(struct list_head *head,
 						    0 /* don't care */,
 						    0 /* don't care */);
 			}
-			reset_resource(res);
+			if (!pci_movable_bars_enabled())
+				reset_resource(res);
 		}
 	}
 }
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index d8ca40a97693..732d18f60f1b 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -298,6 +298,18 @@ static int _pci_assign_resource(struct pci_dev *dev, int resno,
 
 	bus = dev->bus;
 	while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) {
+		if (pci_movable_bars_enabled()) {
+			if (resno >= PCI_BRIDGE_RESOURCES &&
+			    resno <= PCI_BRIDGE_RESOURCE_END) {
+				struct resource *res = dev->resource + resno;
+
+				res->start = 0;
+				res->end = 0;
+				res->flags = 0;
+			}
+			break;
+		}
+
 		if (!bus->parent || !bus->self->transparent)
 			break;
 		bus = bus->parent;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 15/21] PCI: Allow the failed resources to be reassigned later
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Don't lose the size of the requested EP's BAR if it can't be fit
in a current trial, so this can be retried.

But a failed bridge window must be dropped and recalculated in the
next trial.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c |  3 ++-
 drivers/pci/setup-res.c | 12 ++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f9d605cd1725..c1559a4a8564 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -309,7 +309,8 @@ static void assign_requested_resources_sorted(struct list_head *head,
 						    0 /* don't care */,
 						    0 /* don't care */);
 			}
-			reset_resource(res);
+			if (!pci_movable_bars_enabled())
+				reset_resource(res);
 		}
 	}
 }
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index d8ca40a97693..732d18f60f1b 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -298,6 +298,18 @@ static int _pci_assign_resource(struct pci_dev *dev, int resno,
 
 	bus = dev->bus;
 	while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) {
+		if (pci_movable_bars_enabled()) {
+			if (resno >= PCI_BRIDGE_RESOURCES &&
+			    resno <= PCI_BRIDGE_RESOURCE_END) {
+				struct resource *res = dev->resource + resno;
+
+				res->start = 0;
+				res->end = 0;
+				res->flags = 0;
+			}
+			break;
+		}
+
 		if (!bus->parent || !bus->self->transparent)
 			break;
 		bus = bus->parent;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 16/21] PCI: Calculate fixed areas of bridge windows based on fixed BARs
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

For every (IO, MEM, MEM64) bridge window, count the fixed resources of
its children endpoints and children bridge windows:

| <- BAR -> |    | <- child bus fixed_range_hard -> |   | <- fixed BAR -> |
                 | <-            bus's fixed_range_hard                -> |
| <-                       bus's bridge window                         -> |

These ranges will be later used to arrange bridge windows in a way which
covers every immovable BAR as well as the movable ones during hotplug.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       | 14 +++++++
 drivers/pci/probe.c     | 82 +++++++++++++++++++++++++++++++++++++++++
 drivers/pci/setup-bus.c | 17 +++++++++
 include/linux/pci.h     |  6 +++
 4 files changed, 119 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 56b905068ac5..14e3ebe68010 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -364,6 +364,20 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 	return dev->error_state == pci_channel_io_perm_failure;
 }
 
+static inline int pci_get_bridge_resource_idx(struct resource *r)
+{
+	int idx = 1;
+
+	if (r->flags & IORESOURCE_IO)
+		idx = 0;
+	else if (!(r->flags & IORESOURCE_PREFETCH))
+		idx = 1;
+	else if (r->flags & IORESOURCE_MEM_64)
+		idx = 2;
+
+	return idx;
+}
+
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
 #define PCI_DEV_IGNORE 1
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 62f4058a001f..70b15654f253 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -551,6 +551,7 @@ void pci_read_bridge_bases(struct pci_bus *child)
 static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 {
 	struct pci_bus *b;
+	int idx;
 
 	b = kzalloc(sizeof(*b), GFP_KERNEL);
 	if (!b)
@@ -567,6 +568,11 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 	if (parent)
 		b->domain_nr = parent->domain_nr;
 #endif
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		b->fixed_range_hard[idx].start = (resource_size_t)-1;
+		b->fixed_range_hard[idx].end = 0;
+	}
+
 	return b;
 }
 
@@ -3337,6 +3343,81 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static void pci_bus_update_fixed_range_hard(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int idx;
+	resource_size_t start, end;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		bus->fixed_range_hard[idx].start = (resource_size_t)-1;
+		bus->fixed_range_hard[idx].end = 0;
+	}
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_fixed_range_hard(dev->subordinate);
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+
+		for (i = 0; i < PCI_BRIDGE_RESOURCES; ++i) {
+			struct resource *r = &dev->resource[i];
+
+			if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+				continue;
+
+			if (r->flags & IORESOURCE_PCI_FIXED) {
+				idx = pci_get_bridge_resource_idx(r);
+				start = bus->fixed_range_hard[idx].start;
+				end = bus->fixed_range_hard[idx].end;
+
+				if (start > r->start)
+					start = r->start;
+				if (end < r->end)
+					end = r->end;
+
+				if (bus->fixed_range_hard[idx].start != start ||
+				    bus->fixed_range_hard[idx].end != end) {
+					dev_dbg(&bus->dev, "%s: Found fixed 0x%llx-0x%llx in %s, expand the fixed bridge window %d to 0x%llx-0x%llx\n",
+						__func__,
+						(unsigned long long)r->start,
+						(unsigned long long)r->end,
+						dev_name(&dev->dev), idx,
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->fixed_range_hard[idx].start = start;
+					bus->fixed_range_hard[idx].end = end;
+				}
+			}
+		}
+
+		if (dev->subordinate) {
+			struct pci_bus *child = dev->subordinate;
+
+			for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+				start = bus->fixed_range_hard[idx].start;
+				end = bus->fixed_range_hard[idx].end;
+
+				if (start > child->fixed_range_hard[idx].start)
+					start = child->fixed_range_hard[idx].start;
+				if (end < child->fixed_range_hard[idx].end)
+					end = child->fixed_range_hard[idx].end;
+
+				if (start < bus->fixed_range_hard[idx].start ||
+				    end > bus->fixed_range_hard[idx].end) {
+					dev_dbg(&bus->dev, "%s: Expand the fixed bridge window %d from %s to 0x%llx-0x%llx\n",
+						__func__, idx, dev_name(&child->dev),
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->fixed_range_hard[idx].start = start;
+					bus->fixed_range_hard[idx].end = end;
+				}
+			}
+		}
+	}
+}
+
 static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3437,6 +3518,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+		pci_bus_update_fixed_range_hard(root);
 
 		pci_reassign_root_bus_resources(root);
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c1559a4a8564..a1fd7f3c5ea8 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -879,9 +879,17 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	resource_size_t children_add_size = 0;
 	resource_size_t min_align, align;
 
+	resource_size_t fixed_start = bus->fixed_range_hard[0].start;
+	resource_size_t fixed_end = bus->fixed_range_hard[0].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
 	if (!b_res)
 		return;
 
+	if (min_size < fixed_size)
+		min_size = fixed_size;
+
 	min_align = window_alignment(bus, IORESOURCE_IO);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
@@ -990,6 +998,15 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
 	resource_size_t add_align = 0;
+	bool is_mem64 = (mask & IORESOURCE_MEM_64);
+
+	resource_size_t fixed_start = bus->fixed_range_hard[is_mem64 ? 2 : 1].start;
+	resource_size_t fixed_end = bus->fixed_range_hard[is_mem64 ? 2 : 1].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
+	if (min_size < fixed_size)
+		min_size = fixed_size;
 
 	if (!b_res)
 		return -ENOSPC;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 26aa59cb6220..7a4d62d84bc1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -572,6 +572,12 @@ struct pci_bus {
 	struct list_head resources;	/* Address space routed to this bus */
 	struct resource busn_res;	/* Bus numbers routed to this bus */
 
+	/*
+	 * If there are fixed resources in the bridge window, the hard range
+	 * contains the lowest and the highest addresses of them.
+	 */
+	struct resource fixed_range_hard[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 16/21] PCI: Calculate fixed areas of bridge windows based on fixed BARs
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

For every (IO, MEM, MEM64) bridge window, count the fixed resources of
its children endpoints and children bridge windows:

| <- BAR -> |    | <- child bus fixed_range_hard -> |   | <- fixed BAR -> |
                 | <-            bus's fixed_range_hard                -> |
| <-                       bus's bridge window                         -> |

These ranges will be later used to arrange bridge windows in a way which
covers every immovable BAR as well as the movable ones during hotplug.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       | 14 +++++++
 drivers/pci/probe.c     | 82 +++++++++++++++++++++++++++++++++++++++++
 drivers/pci/setup-bus.c | 17 +++++++++
 include/linux/pci.h     |  6 +++
 4 files changed, 119 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 56b905068ac5..14e3ebe68010 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -364,6 +364,20 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 	return dev->error_state == pci_channel_io_perm_failure;
 }
 
+static inline int pci_get_bridge_resource_idx(struct resource *r)
+{
+	int idx = 1;
+
+	if (r->flags & IORESOURCE_IO)
+		idx = 0;
+	else if (!(r->flags & IORESOURCE_PREFETCH))
+		idx = 1;
+	else if (r->flags & IORESOURCE_MEM_64)
+		idx = 2;
+
+	return idx;
+}
+
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
 #define PCI_DEV_IGNORE 1
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 62f4058a001f..70b15654f253 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -551,6 +551,7 @@ void pci_read_bridge_bases(struct pci_bus *child)
 static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 {
 	struct pci_bus *b;
+	int idx;
 
 	b = kzalloc(sizeof(*b), GFP_KERNEL);
 	if (!b)
@@ -567,6 +568,11 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 	if (parent)
 		b->domain_nr = parent->domain_nr;
 #endif
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		b->fixed_range_hard[idx].start = (resource_size_t)-1;
+		b->fixed_range_hard[idx].end = 0;
+	}
+
 	return b;
 }
 
@@ -3337,6 +3343,81 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static void pci_bus_update_fixed_range_hard(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int idx;
+	resource_size_t start, end;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		bus->fixed_range_hard[idx].start = (resource_size_t)-1;
+		bus->fixed_range_hard[idx].end = 0;
+	}
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_fixed_range_hard(dev->subordinate);
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+
+		for (i = 0; i < PCI_BRIDGE_RESOURCES; ++i) {
+			struct resource *r = &dev->resource[i];
+
+			if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+				continue;
+
+			if (r->flags & IORESOURCE_PCI_FIXED) {
+				idx = pci_get_bridge_resource_idx(r);
+				start = bus->fixed_range_hard[idx].start;
+				end = bus->fixed_range_hard[idx].end;
+
+				if (start > r->start)
+					start = r->start;
+				if (end < r->end)
+					end = r->end;
+
+				if (bus->fixed_range_hard[idx].start != start ||
+				    bus->fixed_range_hard[idx].end != end) {
+					dev_dbg(&bus->dev, "%s: Found fixed 0x%llx-0x%llx in %s, expand the fixed bridge window %d to 0x%llx-0x%llx\n",
+						__func__,
+						(unsigned long long)r->start,
+						(unsigned long long)r->end,
+						dev_name(&dev->dev), idx,
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->fixed_range_hard[idx].start = start;
+					bus->fixed_range_hard[idx].end = end;
+				}
+			}
+		}
+
+		if (dev->subordinate) {
+			struct pci_bus *child = dev->subordinate;
+
+			for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+				start = bus->fixed_range_hard[idx].start;
+				end = bus->fixed_range_hard[idx].end;
+
+				if (start > child->fixed_range_hard[idx].start)
+					start = child->fixed_range_hard[idx].start;
+				if (end < child->fixed_range_hard[idx].end)
+					end = child->fixed_range_hard[idx].end;
+
+				if (start < bus->fixed_range_hard[idx].start ||
+				    end > bus->fixed_range_hard[idx].end) {
+					dev_dbg(&bus->dev, "%s: Expand the fixed bridge window %d from %s to 0x%llx-0x%llx\n",
+						__func__, idx, dev_name(&child->dev),
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->fixed_range_hard[idx].start = start;
+					bus->fixed_range_hard[idx].end = end;
+				}
+			}
+		}
+	}
+}
+
 static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3437,6 +3518,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+		pci_bus_update_fixed_range_hard(root);
 
 		pci_reassign_root_bus_resources(root);
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c1559a4a8564..a1fd7f3c5ea8 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -879,9 +879,17 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	resource_size_t children_add_size = 0;
 	resource_size_t min_align, align;
 
+	resource_size_t fixed_start = bus->fixed_range_hard[0].start;
+	resource_size_t fixed_end = bus->fixed_range_hard[0].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
 	if (!b_res)
 		return;
 
+	if (min_size < fixed_size)
+		min_size = fixed_size;
+
 	min_align = window_alignment(bus, IORESOURCE_IO);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
@@ -990,6 +998,15 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
 	resource_size_t add_align = 0;
+	bool is_mem64 = (mask & IORESOURCE_MEM_64);
+
+	resource_size_t fixed_start = bus->fixed_range_hard[is_mem64 ? 2 : 1].start;
+	resource_size_t fixed_end = bus->fixed_range_hard[is_mem64 ? 2 : 1].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
+	if (min_size < fixed_size)
+		min_size = fixed_size;
 
 	if (!b_res)
 		return -ENOSPC;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 26aa59cb6220..7a4d62d84bc1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -572,6 +572,12 @@ struct pci_bus {
 	struct list_head resources;	/* Address space routed to this bus */
 	struct resource busn_res;	/* Bus numbers routed to this bus */
 
+	/*
+	 * If there are fixed resources in the bridge window, the hard range
+	 * contains the lowest and the highest addresses of them.
+	 */
+	struct resource fixed_range_hard[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 17/21] PCI: Calculate boundaries for bridge windows
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

If a bridge window contains fixed areas (there are PCIe devices with
immovable BARs located on this bus), this window must be allocated
within the bound memory area, limited by windows size and by address
range of fixed resources, calculated as follows:

           | <--     bus's fixed_range_hard   --> |
  | <--  fixed_range_hard.end - window size   --> |
           | <--  fixed_range_hard.start + window size   --> |
  | <--                bus's fixed_range_soft            --> |

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 56 +++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h     |  4 ++-
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index a1fd7f3c5ea8..f4737339d5ec 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1809,6 +1809,61 @@ static enum enable_type pci_realloc_detect(struct pci_bus *bus,
 }
 #endif
 
+static void pci_bus_update_fixed_range_soft(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	struct pci_bus *parent = bus->parent;
+	int idx;
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_fixed_range_soft(dev->subordinate);
+
+	if (!parent || !bus->self)
+		return;
+
+	for (idx = 0; idx < ARRAY_SIZE(bus->fixed_range_hard); ++idx) {
+		struct resource *r;
+		resource_size_t soft_start, soft_end;
+		resource_size_t hard_start = bus->fixed_range_hard[idx].start;
+		resource_size_t hard_end = bus->fixed_range_hard[idx].end;
+
+		if (hard_start > hard_end)
+			continue;
+
+		r = bus->resource[idx];
+
+		soft_start = hard_end - resource_size(r) + 1;
+		soft_end = hard_start + resource_size(r) - 1;
+
+		if (soft_start > hard_start)
+			soft_start = hard_start;
+
+		if (soft_end < hard_end)
+			soft_end = hard_end;
+
+		list_for_each_entry(dev, &parent->devices, bus_list) {
+			struct pci_bus *sibling = dev->subordinate;
+			resource_size_t s_start, s_end;
+
+			if (!sibling || sibling == bus)
+				continue;
+
+			s_start = sibling->fixed_range_hard[idx].start;
+			s_end = sibling->fixed_range_hard[idx].end;
+
+			if (s_start > s_end)
+				continue;
+
+			if (s_end < hard_start && s_end > soft_start)
+				soft_start = s_end;
+		}
+
+		bus->fixed_range_soft[idx].start = soft_start;
+		bus->fixed_range_soft[idx].end = soft_end;
+	}
+}
+
 /*
  * first try will not touch pci bridge res
  * second and later try will clear small leaf bridge res
@@ -1847,6 +1902,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	/* Depth first, calculate sizes and alignments of all
 	   subordinate buses. */
 	__pci_bus_size_bridges(bus, add_list);
+	pci_bus_update_fixed_range_soft(bus);
 
 	/* Depth last, allocate resources and update the hardware. */
 	__pci_bus_assign_resources(bus, add_list, &fail_head);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7a4d62d84bc1..75a56db73ad4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -574,9 +574,11 @@ struct pci_bus {
 
 	/*
 	 * If there are fixed resources in the bridge window, the hard range
-	 * contains the lowest and the highest addresses of them.
+	 * contains the lowest and the highest addresses of them, and this
+	 * bridge window must reside within the soft range.
 	 */
 	struct resource fixed_range_hard[PCI_BRIDGE_RESOURCE_NUM];
+	struct resource fixed_range_soft[PCI_BRIDGE_RESOURCE_NUM];
 
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 17/21] PCI: Calculate boundaries for bridge windows
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

If a bridge window contains fixed areas (there are PCIe devices with
immovable BARs located on this bus), this window must be allocated
within the bound memory area, limited by windows size and by address
range of fixed resources, calculated as follows:

           | <--     bus's fixed_range_hard   --> |
  | <--  fixed_range_hard.end - window size   --> |
           | <--  fixed_range_hard.start + window size   --> |
  | <--                bus's fixed_range_soft            --> |

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 56 +++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h     |  4 ++-
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index a1fd7f3c5ea8..f4737339d5ec 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1809,6 +1809,61 @@ static enum enable_type pci_realloc_detect(struct pci_bus *bus,
 }
 #endif
 
+static void pci_bus_update_fixed_range_soft(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	struct pci_bus *parent = bus->parent;
+	int idx;
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_fixed_range_soft(dev->subordinate);
+
+	if (!parent || !bus->self)
+		return;
+
+	for (idx = 0; idx < ARRAY_SIZE(bus->fixed_range_hard); ++idx) {
+		struct resource *r;
+		resource_size_t soft_start, soft_end;
+		resource_size_t hard_start = bus->fixed_range_hard[idx].start;
+		resource_size_t hard_end = bus->fixed_range_hard[idx].end;
+
+		if (hard_start > hard_end)
+			continue;
+
+		r = bus->resource[idx];
+
+		soft_start = hard_end - resource_size(r) + 1;
+		soft_end = hard_start + resource_size(r) - 1;
+
+		if (soft_start > hard_start)
+			soft_start = hard_start;
+
+		if (soft_end < hard_end)
+			soft_end = hard_end;
+
+		list_for_each_entry(dev, &parent->devices, bus_list) {
+			struct pci_bus *sibling = dev->subordinate;
+			resource_size_t s_start, s_end;
+
+			if (!sibling || sibling == bus)
+				continue;
+
+			s_start = sibling->fixed_range_hard[idx].start;
+			s_end = sibling->fixed_range_hard[idx].end;
+
+			if (s_start > s_end)
+				continue;
+
+			if (s_end < hard_start && s_end > soft_start)
+				soft_start = s_end;
+		}
+
+		bus->fixed_range_soft[idx].start = soft_start;
+		bus->fixed_range_soft[idx].end = soft_end;
+	}
+}
+
 /*
  * first try will not touch pci bridge res
  * second and later try will clear small leaf bridge res
@@ -1847,6 +1902,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	/* Depth first, calculate sizes and alignments of all
 	   subordinate buses. */
 	__pci_bus_size_bridges(bus, add_list);
+	pci_bus_update_fixed_range_soft(bus);
 
 	/* Depth last, allocate resources and update the hardware. */
 	__pci_bus_assign_resources(bus, add_list, &fail_head);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7a4d62d84bc1..75a56db73ad4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -574,9 +574,11 @@ struct pci_bus {
 
 	/*
 	 * If there are fixed resources in the bridge window, the hard range
-	 * contains the lowest and the highest addresses of them.
+	 * contains the lowest and the highest addresses of them, and this
+	 * bridge window must reside within the soft range.
 	 */
 	struct resource fixed_range_hard[PCI_BRIDGE_RESOURCE_NUM];
+	struct resource fixed_range_soft[PCI_BRIDGE_RESOURCE_NUM];
 
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 18/21] PCI: Make sure bridge windows include their fixed BARs
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Consider previously calculated boundaries when allocating a bridge
window, setting the lowest allowed address and checking the result.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/bus.c       |  2 +-
 drivers/pci/setup-res.c | 31 +++++++++++++++++++++++++++++--
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index a9784144d6f2..ce2d2aeedbd3 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -192,7 +192,7 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, struct resource *res,
 		 * this is an already-configured bridge window, its start
 		 * overrides "min".
 		 */
-		if (avail.start)
+		if (min_used < avail.start)
 			min_used = avail.start;
 
 		max = avail.end;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 732d18f60f1b..04442339548d 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -248,9 +248,22 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	struct resource *res = dev->resource + resno;
 	resource_size_t min;
 	int ret;
+	resource_size_t start = (resource_size_t)-1;
+	resource_size_t end = 0;
 
 	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
+	if (dev->subordinate && resno >= PCI_BRIDGE_RESOURCES) {
+		struct pci_bus *child_bus = dev->subordinate;
+		int b_resno = resno - PCI_BRIDGE_RESOURCES;
+		resource_size_t soft_start = child_bus->fixed_range_soft[b_resno].start;
+
+		start = child_bus->fixed_range_hard[b_resno].start;
+		end = child_bus->fixed_range_hard[b_resno].end;
+		if (start < end)
+			min = soft_start;
+	}
+
 	/*
 	 * First, try exact prefetching match.  Even if a 64-bit
 	 * prefetchable bridge window is below 4GB, we can't put a 32-bit
@@ -262,7 +275,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 				     IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
 				     pcibios_align_resource, dev);
 	if (ret == 0)
-		return 0;
+		goto check_fixed;
 
 	/*
 	 * If the prefetchable window is only 32 bits wide, we can put
@@ -274,7 +287,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 					     IORESOURCE_PREFETCH,
 					     pcibios_align_resource, dev);
 		if (ret == 0)
-			return 0;
+			goto check_fixed;
 	}
 
 	/*
@@ -287,6 +300,20 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 		ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
 					     pcibios_align_resource, dev);
 
+check_fixed:
+	if (ret == 0 && start < end) {
+		if (res->start > start || res->end < end) {
+			dev_err(&bus->dev, "%s: fixed area 0x%llx-0x%llx for %s doesn't fit in the allocated %pR (0x%llx-0x%llx)",
+				__func__,
+				(unsigned long long)start, (unsigned long long)end,
+				dev_name(&dev->dev),
+				res, (unsigned long long)res->start,
+				(unsigned long long)res->end);
+			release_resource(res);
+			return -1;
+		}
+	}
+
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 18/21] PCI: Make sure bridge windows include their fixed BARs
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Consider previously calculated boundaries when allocating a bridge
window, setting the lowest allowed address and checking the result.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/bus.c       |  2 +-
 drivers/pci/setup-res.c | 31 +++++++++++++++++++++++++++++--
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index a9784144d6f2..ce2d2aeedbd3 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -192,7 +192,7 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, struct resource *res,
 		 * this is an already-configured bridge window, its start
 		 * overrides "min".
 		 */
-		if (avail.start)
+		if (min_used < avail.start)
 			min_used = avail.start;
 
 		max = avail.end;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 732d18f60f1b..04442339548d 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -248,9 +248,22 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	struct resource *res = dev->resource + resno;
 	resource_size_t min;
 	int ret;
+	resource_size_t start = (resource_size_t)-1;
+	resource_size_t end = 0;
 
 	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
+	if (dev->subordinate && resno >= PCI_BRIDGE_RESOURCES) {
+		struct pci_bus *child_bus = dev->subordinate;
+		int b_resno = resno - PCI_BRIDGE_RESOURCES;
+		resource_size_t soft_start = child_bus->fixed_range_soft[b_resno].start;
+
+		start = child_bus->fixed_range_hard[b_resno].start;
+		end = child_bus->fixed_range_hard[b_resno].end;
+		if (start < end)
+			min = soft_start;
+	}
+
 	/*
 	 * First, try exact prefetching match.  Even if a 64-bit
 	 * prefetchable bridge window is below 4GB, we can't put a 32-bit
@@ -262,7 +275,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 				     IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
 				     pcibios_align_resource, dev);
 	if (ret == 0)
-		return 0;
+		goto check_fixed;
 
 	/*
 	 * If the prefetchable window is only 32 bits wide, we can put
@@ -274,7 +287,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 					     IORESOURCE_PREFETCH,
 					     pcibios_align_resource, dev);
 		if (ret == 0)
-			return 0;
+			goto check_fixed;
 	}
 
 	/*
@@ -287,6 +300,20 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 		ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
 					     pcibios_align_resource, dev);
 
+check_fixed:
+	if (ret == 0 && start < end) {
+		if (res->start > start || res->end < end) {
+			dev_err(&bus->dev, "%s: fixed area 0x%llx-0x%llx for %s doesn't fit in the allocated %pR (0x%llx-0x%llx)",
+				__func__,
+				(unsigned long long)start, (unsigned long long)end,
+				dev_name(&dev->dev),
+				res, (unsigned long long)res->start,
+				(unsigned long long)res->end);
+			release_resource(res);
+			return -1;
+		}
+	}
+
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 19/21] PCI: Prioritize fixed BAR assigning over the movable ones
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

The allocated bridge windows are big enough to house all the children
bridges and BARs, but the fixed resources must be assigned first, so the
movable ones later divide the rest of the window. That's the assignment
order:

 1. Bridge windows with fixed areas;
 2. Fixed BARs;
 3. The rest of BARs and bridge windows.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 69 ++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f4737339d5ec..932a6c020d10 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -272,31 +272,54 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
 	}
 }
 
-/**
- * assign_requested_resources_sorted() - satisfy resource requests
- *
- * @head : head of the list tracking requests for resources
- * @fail_head : head of the list tracking requests that could
- *		not be allocated
- *
- * Satisfy resource requests of each element in the list. Add
- * requests that could not satisfied to the failed_list.
- */
-static void assign_requested_resources_sorted(struct list_head *head,
-				 struct list_head *fail_head)
+enum assign_step {
+	assign_fixed_bridge_windows,
+	assign_fixed_resources,
+	assign_float_resources,
+};
+
+static void _assign_requested_resources_sorted(struct list_head *head,
+					       struct list_head *fail_head,
+					       enum assign_step step)
 {
 	struct resource *res;
 	struct pci_dev_resource *dev_res;
 	int idx;
 
 	list_for_each_entry(dev_res, head, list) {
+		bool is_fixed;
+		bool is_fixed_bridge;
+		bool is_bridge;
+
 		if (pci_dev_is_ignored(dev_res->dev))
 			continue;
 
 		res = dev_res->res;
+		if (!resource_size(res))
+			continue;
+
 		idx = res - &dev_res->dev->resource[0];
-		if (resource_size(res) &&
-		    pci_assign_resource(dev_res->dev, idx)) {
+		is_fixed = res->flags & IORESOURCE_PCI_FIXED;
+		is_bridge = dev_res->dev->subordinate && idx >= PCI_BRIDGE_RESOURCES;
+
+		if (is_bridge) {
+			struct pci_bus *child = dev_res->dev->subordinate;
+			int b_res_idx = pci_get_bridge_resource_idx(res);
+			struct resource *fixed_res = &child->fixed_range_hard[b_res_idx];
+
+			is_fixed_bridge = fixed_res->start < fixed_res->end;
+		} else {
+			is_fixed_bridge = false;
+		}
+
+		if (assign_fixed_bridge_windows == step && !is_fixed_bridge)
+			continue;
+		else if (assign_fixed_resources == step && (!is_fixed || is_bridge))
+			continue;
+		else if (assign_float_resources == step && (is_fixed || is_fixed_bridge))
+			continue;
+
+		if (pci_assign_resource(dev_res->dev, idx)) {
 			if (fail_head) {
 				/*
 				 * if the failed res is for ROM BAR, and it will
@@ -315,6 +338,24 @@ static void assign_requested_resources_sorted(struct list_head *head,
 	}
 }
 
+/**
+ * assign_requested_resources_sorted() - satisfy resource requests
+ *
+ * @head : head of the list tracking requests for resources
+ * @fail_head : head of the list tracking requests that could
+ *		not be allocated
+ *
+ * Satisfy resource requests of each element in the list. Add
+ * requests that could not satisfied to the failed_list.
+ */
+static void assign_requested_resources_sorted(struct list_head *head,
+					      struct list_head *fail_head)
+{
+	_assign_requested_resources_sorted(head, fail_head, assign_fixed_bridge_windows);
+	_assign_requested_resources_sorted(head, fail_head, assign_fixed_resources);
+	_assign_requested_resources_sorted(head, fail_head, assign_float_resources);
+}
+
 static unsigned long pci_fail_res_type_mask(struct list_head *fail_head)
 {
 	struct pci_dev_resource *fail_res;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 19/21] PCI: Prioritize fixed BAR assigning over the movable ones
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

The allocated bridge windows are big enough to house all the children
bridges and BARs, but the fixed resources must be assigned first, so the
movable ones later divide the rest of the window. That's the assignment
order:

 1. Bridge windows with fixed areas;
 2. Fixed BARs;
 3. The rest of BARs and bridge windows.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 69 ++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f4737339d5ec..932a6c020d10 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -272,31 +272,54 @@ static void reassign_resources_sorted(struct list_head *realloc_head,
 	}
 }
 
-/**
- * assign_requested_resources_sorted() - satisfy resource requests
- *
- * @head : head of the list tracking requests for resources
- * @fail_head : head of the list tracking requests that could
- *		not be allocated
- *
- * Satisfy resource requests of each element in the list. Add
- * requests that could not satisfied to the failed_list.
- */
-static void assign_requested_resources_sorted(struct list_head *head,
-				 struct list_head *fail_head)
+enum assign_step {
+	assign_fixed_bridge_windows,
+	assign_fixed_resources,
+	assign_float_resources,
+};
+
+static void _assign_requested_resources_sorted(struct list_head *head,
+					       struct list_head *fail_head,
+					       enum assign_step step)
 {
 	struct resource *res;
 	struct pci_dev_resource *dev_res;
 	int idx;
 
 	list_for_each_entry(dev_res, head, list) {
+		bool is_fixed;
+		bool is_fixed_bridge;
+		bool is_bridge;
+
 		if (pci_dev_is_ignored(dev_res->dev))
 			continue;
 
 		res = dev_res->res;
+		if (!resource_size(res))
+			continue;
+
 		idx = res - &dev_res->dev->resource[0];
-		if (resource_size(res) &&
-		    pci_assign_resource(dev_res->dev, idx)) {
+		is_fixed = res->flags & IORESOURCE_PCI_FIXED;
+		is_bridge = dev_res->dev->subordinate && idx >= PCI_BRIDGE_RESOURCES;
+
+		if (is_bridge) {
+			struct pci_bus *child = dev_res->dev->subordinate;
+			int b_res_idx = pci_get_bridge_resource_idx(res);
+			struct resource *fixed_res = &child->fixed_range_hard[b_res_idx];
+
+			is_fixed_bridge = fixed_res->start < fixed_res->end;
+		} else {
+			is_fixed_bridge = false;
+		}
+
+		if (assign_fixed_bridge_windows == step && !is_fixed_bridge)
+			continue;
+		else if (assign_fixed_resources == step && (!is_fixed || is_bridge))
+			continue;
+		else if (assign_float_resources == step && (is_fixed || is_fixed_bridge))
+			continue;
+
+		if (pci_assign_resource(dev_res->dev, idx)) {
 			if (fail_head) {
 				/*
 				 * if the failed res is for ROM BAR, and it will
@@ -315,6 +338,24 @@ static void assign_requested_resources_sorted(struct list_head *head,
 	}
 }
 
+/**
+ * assign_requested_resources_sorted() - satisfy resource requests
+ *
+ * @head : head of the list tracking requests for resources
+ * @fail_head : head of the list tracking requests that could
+ *		not be allocated
+ *
+ * Satisfy resource requests of each element in the list. Add
+ * requests that could not satisfied to the failed_list.
+ */
+static void assign_requested_resources_sorted(struct list_head *head,
+					      struct list_head *fail_head)
+{
+	_assign_requested_resources_sorted(head, fail_head, assign_fixed_bridge_windows);
+	_assign_requested_resources_sorted(head, fail_head, assign_fixed_resources);
+	_assign_requested_resources_sorted(head, fail_head, assign_float_resources);
+}
+
 static unsigned long pci_fail_res_type_mask(struct list_head *fail_head)
 {
 	struct pci_dev_resource *fail_res;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 20/21] PCI: pciehp: Add support for the movable BARs feature
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

With movable BARs, adding a hotplugged device may affect all the PCIe
domain starting from the root, so use a pci_rescan_bus() function which
handles the rearrangement of existing BARs and bridge windows.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/hotplug/pciehp_pci.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index b9c1396db6fe..7c0871db5bae 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -56,12 +56,16 @@ int pciehp_configure_device(struct controller *ctrl)
 		goto out;
 	}
 
-	for_each_pci_bridge(dev, parent)
-		pci_hp_add_bridge(dev);
+	if (pci_movable_bars_enabled()) {
+		pci_rescan_bus(parent);
+	} else {
+		for_each_pci_bridge(dev, parent)
+			pci_hp_add_bridge(dev);
 
-	pci_assign_unassigned_bridge_resources(bridge);
-	pcie_bus_configure_settings(parent);
-	pci_bus_add_devices(parent);
+		pci_assign_unassigned_bridge_resources(bridge);
+		pcie_bus_configure_settings(parent);
+		pci_bus_add_devices(parent);
+	}
 
  out:
 	pci_unlock_rescan_remove();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 20/21] PCI: pciehp: Add support for the movable BARs feature
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

With movable BARs, adding a hotplugged device may affect all the PCIe
domain starting from the root, so use a pci_rescan_bus() function which
handles the rearrangement of existing BARs and bridge windows.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/hotplug/pciehp_pci.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index b9c1396db6fe..7c0871db5bae 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -56,12 +56,16 @@ int pciehp_configure_device(struct controller *ctrl)
 		goto out;
 	}
 
-	for_each_pci_bridge(dev, parent)
-		pci_hp_add_bridge(dev);
+	if (pci_movable_bars_enabled()) {
+		pci_rescan_bus(parent);
+	} else {
+		for_each_pci_bridge(dev, parent)
+			pci_hp_add_bridge(dev);
 
-	pci_assign_unassigned_bridge_resources(bridge);
-	pcie_bus_configure_settings(parent);
-	pci_bus_add_devices(parent);
+		pci_assign_unassigned_bridge_resources(bridge);
+		pcie_bus_configure_settings(parent);
+		pci_bus_add_devices(parent);
+	}
 
  out:
 	pci_unlock_rescan_remove();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 21/21] powerpc/pci: Fix crash with enabled movable BARs
  2019-03-11 13:31 ` Sergey Miroshnichenko
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Bjorn Helgaas, linux, Sergey Miroshnichenko

Check a resource for the UNSET flags.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index fa6af52b5219..353b36727f6a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2977,7 +2977,8 @@ static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe,
 	int index;
 	int64_t rc;
 
-	if (!res || !res->flags || res->start > res->end)
+	if (!res || !res->flags || res->start > res->end ||
+	    (res->flags & IORESOURCE_UNSET))
 		return;
 
 	if (res->flags & IORESOURCE_IO) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 21/21] powerpc/pci: Fix crash with enabled movable BARs
@ 2019-03-11 13:31   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-11 13:31 UTC (permalink / raw)
  To: linux-pci, linuxppc-dev; +Cc: Sergey Miroshnichenko, Bjorn Helgaas, linux

Check a resource for the UNSET flags.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index fa6af52b5219..353b36727f6a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2977,7 +2977,8 @@ static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe,
 	int index;
 	int64_t rc;
 
-	if (!res || !res->flags || res->start > res->end)
+	if (!res || !res->flags || res->start > res->end ||
+	    (res->flags & IORESOURCE_UNSET))
 		return;
 
 	if (res->flags & IORESOURCE_IO) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 01/21] PCI: Fix writing invalid BARs during pci_restore_state()
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 14:02   ` Bjorn Helgaas
  -1 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 14:02 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

Hi Sergey,

Thanks for all your work here.  This is a long-standing problem, and
I'm glad you're working on it.

On Mon, Mar 11, 2019 at 04:31:02PM +0300, Sergey Miroshnichenko wrote:
> If BAR movement has happened (due to PCIe hotplug) after pci_save_state(),
> the saved addresses will become outdated. Restore them the most recently
> calculated values, not the ones stored in an arbitrary moment.

Maybe pci_save_state() should not even save BAR values, since we have
no mechanism to determine whether those saved values are valid?

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7c1b362f599a..f006068be209 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1376,7 +1376,7 @@ static void pci_restore_config_space(struct pci_dev *pdev)
>  	if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
>  		pci_restore_config_space_range(pdev, 10, 15, 0, false);
>  		/* Restore BARs before the command register. */
> -		pci_restore_config_space_range(pdev, 4, 9, 10, false);
> +		pci_restore_bars(pdev);

pci_restore_bars() is a much longer call path than
pci_restore_config_space_range(), so it's a little bit scary just from
the complexity point of view, but I think this does make sense.

But I am concerned that we don't handle bridge BARs the same way (this
is an existing problem, not something you're introducing).

Bridge BARs (if implemented) are dwords 4 and 5, so they are currently
restored as part of this range:

  pci_restore_config_space_range(pdev, 0, 8, 0, false);

If we followed the same pattern as for type 0 devices, this would look
like:

  pci_restore_config_space_range(pdev, 6, 8, 0, false);
  pci_restore_config_space_range(pdev, 4, 5, 10, false);  /* BARs */
  pci_restore_config_space_range(pdev, 0, 3, 0, false);

And after your patch, it would look like:

  pci_restore_config_space_range(pdev, 6, 8, 0, false);
  pci_restore_bars(pdev);
  pci_restore_config_space_range(pdev, 0, 3, 0, false);

I think this would require a little enhancement in pci_restore_bars()
to filter the BAR range based on the hdr_type.

I would propose

  - adding a new patch to split up the bridge restore so the (0, 8)
    range is split into (6, 8); (4, 5); (0, 3), so it matches the type
    0 restore.

  - adding another new patch to filter the BAR range in
    pci_restore_bars().

  - updating this patch to use pci_restore_bars() in both the type 0
    and type 1 paths.

  - possibly adding a patch to make pci_save_state() not save BAR
    values in dev->saved_config_space, and any other changes needed to
    stop reading BARs from that area.

What do you think?

>  		pci_restore_config_space_range(pdev, 0, 3, 0, false);
>  	} else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
>  		pci_restore_config_space_range(pdev, 12, 15, 0, false);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device()
  2019-03-11 13:31   ` Sergey Miroshnichenko
@ 2019-03-26 19:00     ` Bjorn Helgaas
  -1 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 19:00 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: linux-pci, linuxppc-dev, linux, Srinath Mannam, Marta Rybczynska,
	linux-kernel

[+cc Srinath, Marta, LKML]

On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote:
>  CPU0                                      CPU1
> 
>  pci_enable_device_mem()                   pci_enable_device_mem()
>    pci_enable_bridge()                       pci_enable_bridge()
>      pci_is_enabled()
>        return false;
>      atomic_inc_return(enable_cnt)
>      Start actual enabling the bridge
>      ...                                       pci_is_enabled()
>      ...                                         return true;
>      ...                                   Start memory requests <-- FAIL
>      ...
>      Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> This patch protects the pci_enable/disable_device() and pci_enable_bridge()
> with mutexes.

This is a subtle issue that we've tried to fix before, but we've never
had a satisfactory solution, so I hope you've figured out the right
fix.

I'll include some links to previous discussion.  This patch is very
similar to [2], which we didn't actually apply.  We did apply the
patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling
upstream bridges"), but it caused the regressions reported in [4,5],
so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while
enabling upstream bridges"").

I think the underlying design problem is that we have a driver for
device B calling pci_enable_device(), and it is changing the state of
device A (an upstream bridge).  The model generally is that a driver
should only touch the device it is bound to.

It's tricky to get the locking right when several children of device A
all need to operate on A.

That's all to say I'll have to think carefully about this particular
patch, so I'll go on to the others and come back to this one.

Bjorn

[1] https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch

[2] https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch

[3] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH v3] pci: Concurrency issue during pci enable bridge

[4] https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
    [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream bridges concurrently

[5] https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231b21@kernel.dk/T/#u
    iwlwifi firmware load broken in current -git

[6] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
    [RFC PATCH] nvme: avoid race-conditions when enabling devices

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
>  drivers/pci/probe.c |  1 +
>  include/linux/pci.h |  1 +
>  3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index f006068be209..895201d4c9e6 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  	struct pci_dev *bridge;
>  	int retval;
>  
> +	mutex_lock(&dev->enable_mutex);
> +
>  	bridge = pci_upstream_bridge(dev);
>  	if (bridge)
>  		pci_enable_bridge(bridge);
> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  	if (pci_is_enabled(dev)) {
>  		if (!dev->is_busmaster)
>  			pci_set_master(dev);
> +		mutex_unlock(&dev->enable_mutex);
>  		return;
>  	}
>  
> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  		pci_err(dev, "Error enabling bridge (%d), continuing\n",
>  			retval);
>  	pci_set_master(dev);
> +	mutex_unlock(&dev->enable_mutex);
>  }
>  
>  static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>  {
>  	struct pci_dev *bridge;
> +	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
> +	bool need_lock = !dev->subordinate;
>  	int err;
>  	int i, bars = 0;
>  
> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>  		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>  	}
>  
> -	if (atomic_inc_return(&dev->enable_cnt) > 1)
> +	if (need_lock)
> +		mutex_lock(&dev->enable_mutex);
> +	if (pci_is_enabled(dev)) {
> +		if (need_lock)
> +			mutex_unlock(&dev->enable_mutex);
>  		return 0;		/* already enabled */
> +	}
>  
>  	bridge = pci_upstream_bridge(dev);
>  	if (bridge)
> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>  			bars |= (1 << i);
>  
>  	err = do_pci_enable_device(dev, bars);
> -	if (err < 0)
> -		atomic_dec(&dev->enable_cnt);
> +	if (err >= 0)
> +		atomic_inc(&dev->enable_cnt);
> +	if (need_lock)
> +		mutex_unlock(&dev->enable_mutex);
>  	return err;
>  }
>  
> @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev)
>  	if (dr)
>  		dr->enabled = 0;
>  
> +	mutex_lock(&dev->enable_mutex);
>  	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
>  		      "disabling already-disabled device");
>  
> -	if (atomic_dec_return(&dev->enable_cnt) != 0)
> +	if (atomic_dec_return(&dev->enable_cnt) != 0) {
> +		mutex_unlock(&dev->enable_mutex);
>  		return;
> +	}
>  
>  	do_pci_disable_device(dev);
>  
>  	dev->is_busmaster = 0;
> +
> +	mutex_unlock(&dev->enable_mutex);
>  }
>  EXPORT_SYMBOL(pci_disable_device);
>  
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 2ec0df04e0dc..977a127ce791 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
>  	INIT_LIST_HEAD(&dev->bus_list);
>  	dev->dev.type = &pci_dev_type;
>  	dev->bus = pci_bus_get(bus);
> +	mutex_init(&dev->enable_mutex);
>  
>  	return dev;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 77448215ef5b..cb2760a31fe2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -419,6 +419,7 @@ struct pci_dev {
>  	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
>  	pci_dev_flags_t dev_flags;
>  	atomic_t	enable_cnt;	/* pci_enable_device has been called */
> +	struct mutex	enable_mutex;
>  
>  	u32		saved_config_space[16]; /* Config space saved at suspend time */
>  	struct hlist_head saved_cap_space;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device()
@ 2019-03-26 19:00     ` Bjorn Helgaas
  0 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 19:00 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: Marta Rybczynska, linux-pci, linux-kernel, linux, Srinath Mannam,
	linuxppc-dev

[+cc Srinath, Marta, LKML]

On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote:
>  CPU0                                      CPU1
> 
>  pci_enable_device_mem()                   pci_enable_device_mem()
>    pci_enable_bridge()                       pci_enable_bridge()
>      pci_is_enabled()
>        return false;
>      atomic_inc_return(enable_cnt)
>      Start actual enabling the bridge
>      ...                                       pci_is_enabled()
>      ...                                         return true;
>      ...                                   Start memory requests <-- FAIL
>      ...
>      Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> This patch protects the pci_enable/disable_device() and pci_enable_bridge()
> with mutexes.

This is a subtle issue that we've tried to fix before, but we've never
had a satisfactory solution, so I hope you've figured out the right
fix.

I'll include some links to previous discussion.  This patch is very
similar to [2], which we didn't actually apply.  We did apply the
patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling
upstream bridges"), but it caused the regressions reported in [4,5],
so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while
enabling upstream bridges"").

I think the underlying design problem is that we have a driver for
device B calling pci_enable_device(), and it is changing the state of
device A (an upstream bridge).  The model generally is that a driver
should only touch the device it is bound to.

It's tricky to get the locking right when several children of device A
all need to operate on A.

That's all to say I'll have to think carefully about this particular
patch, so I'll go on to the others and come back to this one.

Bjorn

[1] https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch

[2] https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch

[3] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH v3] pci: Concurrency issue during pci enable bridge

[4] https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
    [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream bridges concurrently

[5] https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231b21@kernel.dk/T/#u
    iwlwifi firmware load broken in current -git

[6] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
    [RFC PATCH] nvme: avoid race-conditions when enabling devices

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
>  drivers/pci/probe.c |  1 +
>  include/linux/pci.h |  1 +
>  3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index f006068be209..895201d4c9e6 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  	struct pci_dev *bridge;
>  	int retval;
>  
> +	mutex_lock(&dev->enable_mutex);
> +
>  	bridge = pci_upstream_bridge(dev);
>  	if (bridge)
>  		pci_enable_bridge(bridge);
> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  	if (pci_is_enabled(dev)) {
>  		if (!dev->is_busmaster)
>  			pci_set_master(dev);
> +		mutex_unlock(&dev->enable_mutex);
>  		return;
>  	}
>  
> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  		pci_err(dev, "Error enabling bridge (%d), continuing\n",
>  			retval);
>  	pci_set_master(dev);
> +	mutex_unlock(&dev->enable_mutex);
>  }
>  
>  static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>  {
>  	struct pci_dev *bridge;
> +	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
> +	bool need_lock = !dev->subordinate;
>  	int err;
>  	int i, bars = 0;
>  
> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>  		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>  	}
>  
> -	if (atomic_inc_return(&dev->enable_cnt) > 1)
> +	if (need_lock)
> +		mutex_lock(&dev->enable_mutex);
> +	if (pci_is_enabled(dev)) {
> +		if (need_lock)
> +			mutex_unlock(&dev->enable_mutex);
>  		return 0;		/* already enabled */
> +	}
>  
>  	bridge = pci_upstream_bridge(dev);
>  	if (bridge)
> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>  			bars |= (1 << i);
>  
>  	err = do_pci_enable_device(dev, bars);
> -	if (err < 0)
> -		atomic_dec(&dev->enable_cnt);
> +	if (err >= 0)
> +		atomic_inc(&dev->enable_cnt);
> +	if (need_lock)
> +		mutex_unlock(&dev->enable_mutex);
>  	return err;
>  }
>  
> @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev)
>  	if (dr)
>  		dr->enabled = 0;
>  
> +	mutex_lock(&dev->enable_mutex);
>  	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
>  		      "disabling already-disabled device");
>  
> -	if (atomic_dec_return(&dev->enable_cnt) != 0)
> +	if (atomic_dec_return(&dev->enable_cnt) != 0) {
> +		mutex_unlock(&dev->enable_mutex);
>  		return;
> +	}
>  
>  	do_pci_disable_device(dev);
>  
>  	dev->is_busmaster = 0;
> +
> +	mutex_unlock(&dev->enable_mutex);
>  }
>  EXPORT_SYMBOL(pci_disable_device);
>  
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 2ec0df04e0dc..977a127ce791 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
>  	INIT_LIST_HEAD(&dev->bus_list);
>  	dev->dev.type = &pci_dev_type;
>  	dev->bus = pci_bus_get(bus);
> +	mutex_init(&dev->enable_mutex);
>  
>  	return dev;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 77448215ef5b..cb2760a31fe2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -419,6 +419,7 @@ struct pci_dev {
>  	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
>  	pci_dev_flags_t dev_flags;
>  	atomic_t	enable_cnt;	/* pci_enable_device has been called */
> +	struct mutex	enable_mutex;
>  
>  	u32		saved_config_space[16]; /* Config space saved at suspend time */
>  	struct hlist_head saved_cap_space;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 03/21] PCI: Enable bridge's I/O and MEM access for hotplugged devices
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 19:13   ` Bjorn Helgaas
  2019-03-27 17:13       ` Sergey Miroshnichenko
  -1 siblings, 1 reply; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 19:13 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:04PM +0300, Sergey Miroshnichenko wrote:
> After updating the bridge window resources, the PCI_COMMAND_IO and
> PCI_COMMAND_MEMORY bits of the bridge must be addressed as well.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/pci.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 895201d4c9e6..69898fe5255e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1622,6 +1622,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  		pci_enable_bridge(bridge);
>  
>  	if (pci_is_enabled(dev)) {
> +		int i, bars = 0;
> +
> +		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
> +			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
> +				bars |= (1 << i);
> +		}
> +		do_pci_enable_device(dev, bars);

In what situation is this needed, exactly?  This code already exists
in pci_enable_device_flags().  Why isn't that enough?

I guess maybe there's some case where we enable the bridge, then
assign bridge windows, then enable a downstream device?

Does this fix a bug with current hotplug?

>  		if (!dev->is_busmaster)
>  			pci_set_master(dev);
>  		mutex_unlock(&dev->enable_mutex);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 05/21] PCI: hotplug: Add a flag for the movable BARs feature
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 19:24   ` Bjorn Helgaas
  2019-03-27 17:16       ` Sergey Miroshnichenko
  -1 siblings, 1 reply; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 19:24 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:06PM +0300, Sergey Miroshnichenko wrote:
> If a new PCIe device has been hot-plugged between the two active ones
> without big enough gap between their BARs, 

Just to speak precisely here, a hot-added device is not "between" two
active ones because the new device has zeros in its BARs.

BARs from different devices can be interleaved arbitrarily, subject to
bridge window constraints, so we can really only speak about a *BAR*
(not the entire device) being between two other BARs.

Also, I don't think there's anything here that is PCIe-specific, so we
should talk about "PCI", not "PCIe".

> these BARs should be moved
> if their drivers support this feature. The drivers should be notified
> and paused during the procedure:
> 
> 1)                 dev 8 (new)
>                        |
>                        v
> .. |  dev 3  |  dev 3  |  dev 5  |  dev 7  |
> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
> 
> 2)                             dev 8
>                                  |
>                                  v
> .. |  dev 3  |  dev 3  | -->           --> |  dev 5  |  dev 7  |
> .. |  BAR 0  |  BAR 1  | -->           --> |  BAR 0  |  BAR 0  |
> 
>  3)
> 
> .. |  dev 3  |  dev 3  |  dev 8  |  dev 8  |  dev 5  |  dev 7  |
> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
> 
> Thus, prior reservation of memory regions by BIOS/bootloader/firmware
> is not required anymore for the PCIe hotplug.
> 
> The PCI_MOVABLE_BARS flag is set by the platform is this feature is
> supported and tested, but can be overridden by the following command
> line option:
>     pcie_movable_bars={ off | force }

A chicken switch to turn this functionality off is OK, but I think it
should be enabled by default.  There isn't anything about this that's
platform-specific, is there?

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  7 ++++++
>  drivers/pci/pci.c                             | 24 +++++++++++++++++++
>  include/linux/pci.h                           |  2 ++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 2b8ee90bb644..d40eaf993f80 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3417,6 +3417,13 @@
>  		nomsi	Do not use MSI for native PCIe PME signaling (this makes
>  			all PCIe root ports use INTx for all services).
>  
> +	pcie_movable_bars=[PCIE]
> +			Override the movable BARs support detection:
> +		off
> +			Disable even if supported by the platform
> +		force
> +			Enable even if not explicitly declared as supported
> +
>  	pcmv=		[HW,PCMCIA] BadgePAD 4
>  
>  	pd_ignore_unused
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 69898fe5255e..4dac49a887ec 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
>  }
>  __setup("pcie_port_pm=", pcie_port_pm_setup);
>  
> +static bool pcie_movable_bars_off;
> +static bool pcie_movable_bars_force;
> +static int __init pcie_movable_bars_setup(char *str)
> +{
> +	if (!strcmp(str, "off"))
> +		pcie_movable_bars_off = true;
> +	else if (!strcmp(str, "force"))
> +		pcie_movable_bars_force = true;
> +	return 1;
> +}
> +__setup("pcie_movable_bars=", pcie_movable_bars_setup);
> +
> +bool pci_movable_bars_enabled(void)
> +{
> +	if (pcie_movable_bars_off)
> +		return false;
> +
> +	if (pcie_movable_bars_force)
> +		return true;
> +
> +	return pci_has_flag(PCI_MOVABLE_BARS);
> +}
> +EXPORT_SYMBOL(pci_movable_bars_enabled);
> +
>  /* Time to wait after a reset for device to become responsive */
>  #define PCIE_RESET_READY_POLL_MS 60000
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index cb2760a31fe2..cbe661aff9f5 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -866,6 +866,7 @@ enum {
>  	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
>  	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
>  	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
> +	PCI_MOVABLE_BARS	= 0x00000080,	/* Runtime BAR reassign after hotplug */
>  };
>  
>  /* These external functions are only available when PCI support is enabled */
> @@ -1345,6 +1346,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>  void pci_setup_bridge(struct pci_bus *bus);
>  resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>  					 unsigned long type);
> +bool pci_movable_bars_enabled(void);
>  
>  #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
>  #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 07/21] PCI: Wake up bridges during rescan when movable BARs enabled
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 19:28   ` Bjorn Helgaas
  -1 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 19:28 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:08PM +0300, Sergey Miroshnichenko wrote:
> Use the PM runtime methods to wake up the bridges before accessing
> their config space.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/probe.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 88350dd56344..dc935f82a595 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3252,6 +3252,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
>  {
>  	struct pci_dev *dev;
>  
> +	pm_runtime_get_sync(&bus->dev);

This should be part of the patch that adds the config space access
so we can tell specifically what code requires the wakeup.

>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>  		struct pci_bus *child = dev->subordinate;
>  
> @@ -3278,6 +3280,8 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
>  			dev->driver->rescan_done(dev);
>  		}
>  	}
> +
> +	pm_runtime_put(&bus->dev);
>  }
>  
>  /**
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 20:20     ` Bjorn Helgaas
  -1 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 20:20 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: linux-pci, linuxppc-dev, linux, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, linux-kernel

[+cc Keith, Jens, Christoph, Sagi, linux-nvme, LKML]

On Mon, Mar 11, 2019 at 04:31:09PM +0300, Sergey Miroshnichenko wrote:
> Hotplugged devices can affect the existing ones by moving their BARs.
> PCI subsystem will inform the NVME driver about this by invoking
> reset_prepare()+reset_done(), then iounmap()+ioremap() must be called.

Do you mean the PCI core will invoke ->rescan_prepare() and
->rescan_done() (as opposed to *reset*)?

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/nvme/host/pci.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 92bad1c810ac..ccea3033a67a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -106,6 +106,7 @@ struct nvme_dev {
>  	unsigned int num_vecs;
>  	int q_depth;
>  	u32 db_stride;
> +	resource_size_t current_phys_bar;
>  	void __iomem *bar;
>  	unsigned long bar_mapped_size;
>  	struct work_struct remove_work;
> @@ -1672,13 +1673,16 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev->dev);
>  
> -	if (size <= dev->bar_mapped_size)
> +	if (dev->bar &&
> +	    dev->current_phys_bar == pci_resource_start(pdev, 0) &&
> +	    size <= dev->bar_mapped_size)
>  		return 0;
>  	if (size > pci_resource_len(pdev, 0))
>  		return -ENOMEM;
>  	if (dev->bar)
>  		iounmap(dev->bar);
> -	dev->bar = ioremap(pci_resource_start(pdev, 0), size);
> +	dev->current_phys_bar = pci_resource_start(pdev, 0);
> +	dev->bar = ioremap(dev->current_phys_bar, size);

dev->current_phys_bar is different from pci_resource_start() in the
case where the PCI core has moved the nvme BAR, but nvme has not yet
remapped it.

I'm not sure it's worth keeping track of current_phys_bar, as opposed
to always unmapping and remapping.  Is this a performance path?  I
think there are advantages to always exercising the same code path,
regardless of whether the BAR happened to be moved, e.g., if there's a
bug in the "BAR moved" path, it may be a heisenbug because whether we
exercise that path depends on the current configuration.

If you do need to cache current_phys_bar, maybe this, so it's a little
easier to see that you're not changing the ioremap() itself:

  dev->bar = ioremap(pci_resource_start(pdev, 0), size);
  dev->current_phys_bar = pci_resource_start(pdev, 0);

>  	if (!dev->bar) {
>  		dev->bar_mapped_size = 0;
>  		return -ENOMEM;
> @@ -2504,6 +2508,8 @@ static void nvme_reset_work(struct work_struct *work)
>  	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>  		goto out;
>  
> +	nvme_remap_bar(dev, db_bar_size(dev, 0));

How is this change connected to rescan?  This looks reset-related.

>  	/*
>  	 * If we're called to reset a live controller first shut it down before
>  	 * moving on.
> @@ -2910,6 +2916,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
>  	flush_work(&dev->ctrl.reset_work);
>  }
>  
> +void nvme_rescan_prepare(struct pci_dev *pdev)
> +{
> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> +	nvme_dev_disable(dev, false);
> +	nvme_dev_unmap(dev);
> +	dev->bar = NULL;
> +}
> +
> +void nvme_rescan_done(struct pci_dev *pdev)
> +{
> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> +	nvme_dev_map(dev);
> +	nvme_reset_ctrl_sync(&dev->ctrl);
> +}
> +
>  static const struct pci_error_handlers nvme_err_handler = {
>  	.error_detected	= nvme_error_detected,
>  	.slot_reset	= nvme_slot_reset,
> @@ -2974,6 +2997,8 @@ static struct pci_driver nvme_driver = {
>  	},
>  	.sriov_configure = pci_sriov_configure_simple,
>  	.err_handler	= &nvme_err_handler,
> +	.rescan_prepare	= nvme_rescan_prepare,
> +	.rescan_done	= nvme_rescan_done,
>  };
>  
>  static int __init nvme_init(void)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs
@ 2019-03-26 20:20     ` Bjorn Helgaas
  0 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 20:20 UTC (permalink / raw)


[+cc Keith, Jens, Christoph, Sagi, linux-nvme, LKML]

On Mon, Mar 11, 2019@04:31:09PM +0300, Sergey Miroshnichenko wrote:
> Hotplugged devices can affect the existing ones by moving their BARs.
> PCI subsystem will inform the NVME driver about this by invoking
> reset_prepare()+reset_done(), then iounmap()+ioremap() must be called.

Do you mean the PCI core will invoke ->rescan_prepare() and
->rescan_done() (as opposed to *reset*)?

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko at yadro.com>
> ---
>  drivers/nvme/host/pci.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 92bad1c810ac..ccea3033a67a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -106,6 +106,7 @@ struct nvme_dev {
>  	unsigned int num_vecs;
>  	int q_depth;
>  	u32 db_stride;
> +	resource_size_t current_phys_bar;
>  	void __iomem *bar;
>  	unsigned long bar_mapped_size;
>  	struct work_struct remove_work;
> @@ -1672,13 +1673,16 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev->dev);
>  
> -	if (size <= dev->bar_mapped_size)
> +	if (dev->bar &&
> +	    dev->current_phys_bar == pci_resource_start(pdev, 0) &&
> +	    size <= dev->bar_mapped_size)
>  		return 0;
>  	if (size > pci_resource_len(pdev, 0))
>  		return -ENOMEM;
>  	if (dev->bar)
>  		iounmap(dev->bar);
> -	dev->bar = ioremap(pci_resource_start(pdev, 0), size);
> +	dev->current_phys_bar = pci_resource_start(pdev, 0);
> +	dev->bar = ioremap(dev->current_phys_bar, size);

dev->current_phys_bar is different from pci_resource_start() in the
case where the PCI core has moved the nvme BAR, but nvme has not yet
remapped it.

I'm not sure it's worth keeping track of current_phys_bar, as opposed
to always unmapping and remapping.  Is this a performance path?  I
think there are advantages to always exercising the same code path,
regardless of whether the BAR happened to be moved, e.g., if there's a
bug in the "BAR moved" path, it may be a heisenbug because whether we
exercise that path depends on the current configuration.

If you do need to cache current_phys_bar, maybe this, so it's a little
easier to see that you're not changing the ioremap() itself:

  dev->bar = ioremap(pci_resource_start(pdev, 0), size);
  dev->current_phys_bar = pci_resource_start(pdev, 0);

>  	if (!dev->bar) {
>  		dev->bar_mapped_size = 0;
>  		return -ENOMEM;
> @@ -2504,6 +2508,8 @@ static void nvme_reset_work(struct work_struct *work)
>  	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>  		goto out;
>  
> +	nvme_remap_bar(dev, db_bar_size(dev, 0));

How is this change connected to rescan?  This looks reset-related.

>  	/*
>  	 * If we're called to reset a live controller first shut it down before
>  	 * moving on.
> @@ -2910,6 +2916,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
>  	flush_work(&dev->ctrl.reset_work);
>  }
>  
> +void nvme_rescan_prepare(struct pci_dev *pdev)
> +{
> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> +	nvme_dev_disable(dev, false);
> +	nvme_dev_unmap(dev);
> +	dev->bar = NULL;
> +}
> +
> +void nvme_rescan_done(struct pci_dev *pdev)
> +{
> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> +	nvme_dev_map(dev);
> +	nvme_reset_ctrl_sync(&dev->ctrl);
> +}
> +
>  static const struct pci_error_handlers nvme_err_handler = {
>  	.error_detected	= nvme_error_detected,
>  	.slot_reset	= nvme_slot_reset,
> @@ -2974,6 +2997,8 @@ static struct pci_driver nvme_driver = {
>  	},
>  	.sriov_configure = pci_sriov_configure_simple,
>  	.err_handler	= &nvme_err_handler,
> +	.rescan_prepare	= nvme_rescan_prepare,
> +	.rescan_done	= nvme_rescan_done,
>  };
>  
>  static int __init nvme_init(void)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs
@ 2019-03-26 20:20     ` Bjorn Helgaas
  0 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 20:20 UTC (permalink / raw)
  To: Sergey Miroshnichenko
  Cc: Jens Axboe, Sagi Grimberg, linux-pci, linux-kernel, linux-nvme,
	linux, Keith Busch, linuxppc-dev, Christoph Hellwig

[+cc Keith, Jens, Christoph, Sagi, linux-nvme, LKML]

On Mon, Mar 11, 2019 at 04:31:09PM +0300, Sergey Miroshnichenko wrote:
> Hotplugged devices can affect the existing ones by moving their BARs.
> PCI subsystem will inform the NVME driver about this by invoking
> reset_prepare()+reset_done(), then iounmap()+ioremap() must be called.

Do you mean the PCI core will invoke ->rescan_prepare() and
->rescan_done() (as opposed to *reset*)?

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/nvme/host/pci.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 92bad1c810ac..ccea3033a67a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -106,6 +106,7 @@ struct nvme_dev {
>  	unsigned int num_vecs;
>  	int q_depth;
>  	u32 db_stride;
> +	resource_size_t current_phys_bar;
>  	void __iomem *bar;
>  	unsigned long bar_mapped_size;
>  	struct work_struct remove_work;
> @@ -1672,13 +1673,16 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev->dev);
>  
> -	if (size <= dev->bar_mapped_size)
> +	if (dev->bar &&
> +	    dev->current_phys_bar == pci_resource_start(pdev, 0) &&
> +	    size <= dev->bar_mapped_size)
>  		return 0;
>  	if (size > pci_resource_len(pdev, 0))
>  		return -ENOMEM;
>  	if (dev->bar)
>  		iounmap(dev->bar);
> -	dev->bar = ioremap(pci_resource_start(pdev, 0), size);
> +	dev->current_phys_bar = pci_resource_start(pdev, 0);
> +	dev->bar = ioremap(dev->current_phys_bar, size);

dev->current_phys_bar is different from pci_resource_start() in the
case where the PCI core has moved the nvme BAR, but nvme has not yet
remapped it.

I'm not sure it's worth keeping track of current_phys_bar, as opposed
to always unmapping and remapping.  Is this a performance path?  I
think there are advantages to always exercising the same code path,
regardless of whether the BAR happened to be moved, e.g., if there's a
bug in the "BAR moved" path, it may be a heisenbug because whether we
exercise that path depends on the current configuration.

If you do need to cache current_phys_bar, maybe this, so it's a little
easier to see that you're not changing the ioremap() itself:

  dev->bar = ioremap(pci_resource_start(pdev, 0), size);
  dev->current_phys_bar = pci_resource_start(pdev, 0);

>  	if (!dev->bar) {
>  		dev->bar_mapped_size = 0;
>  		return -ENOMEM;
> @@ -2504,6 +2508,8 @@ static void nvme_reset_work(struct work_struct *work)
>  	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>  		goto out;
>  
> +	nvme_remap_bar(dev, db_bar_size(dev, 0));

How is this change connected to rescan?  This looks reset-related.

>  	/*
>  	 * If we're called to reset a live controller first shut it down before
>  	 * moving on.
> @@ -2910,6 +2916,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
>  	flush_work(&dev->ctrl.reset_work);
>  }
>  
> +void nvme_rescan_prepare(struct pci_dev *pdev)
> +{
> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> +	nvme_dev_disable(dev, false);
> +	nvme_dev_unmap(dev);
> +	dev->bar = NULL;
> +}
> +
> +void nvme_rescan_done(struct pci_dev *pdev)
> +{
> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> +	nvme_dev_map(dev);
> +	nvme_reset_ctrl_sync(&dev->ctrl);
> +}
> +
>  static const struct pci_error_handlers nvme_err_handler = {
>  	.error_detected	= nvme_error_detected,
>  	.slot_reset	= nvme_slot_reset,
> @@ -2974,6 +2997,8 @@ static struct pci_driver nvme_driver = {
>  	},
>  	.sriov_configure = pci_sriov_configure_simple,
>  	.err_handler	= &nvme_err_handler,
> +	.rescan_prepare	= nvme_rescan_prepare,
> +	.rescan_done	= nvme_rescan_done,
>  };
>  
>  static int __init nvme_init(void)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 09/21] PCI: Mark immovable BARs with PCI_FIXED
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 20:28   ` Bjorn Helgaas
  2019-03-27 17:03     ` David Laight
  -1 siblings, 1 reply; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 20:28 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:10PM +0300, Sergey Miroshnichenko wrote:
> If a PCIe device driver doesn't yet have support for movable BARs,
> mark device's BARs with IORESOURCE_PCI_FIXED.

I'm hesitant about using IORESOURCE_PCI_FIXED for this purpose.  That
was originally added to describe resources that can not be changed
because they're hardwired in the device, e.g., legacy resources and
Enhanced Allocation resources.

In general, I think the bits in res->flags should tell us things about
the hardware.  This particular use would be something about the
*driver*, and I think we should figure that out by looking at
dev->driver.

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/probe.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index dc935f82a595..1cf6ec960236 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3262,6 +3262,21 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
>  		} else if (dev->driver &&
>  			   dev->driver->rescan_prepare) {
>  			dev->driver->rescan_prepare(dev);
> +		} else if (dev->driver || ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)) {
> +			int i;
> +
> +			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> +				struct resource *r = &dev->resource[i];
> +
> +				if (!r->flags || !r->parent ||
> +				    (r->flags & IORESOURCE_UNSET) ||
> +				    (r->flags & IORESOURCE_PCI_FIXED))
> +					continue;
> +
> +				r->flags |= IORESOURCE_PCI_FIXED;
> +				pci_warn(dev, "%s: no support for movable BARs, mark BAR %d (%pR) as fixed\n",
> +					 __func__, i, r);
> +			}
>  		}
>  	}
>  }
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 10/21] PCI: Fix assigning of fixed prefetchable resources
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 20:37   ` Bjorn Helgaas
  -1 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 20:37 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:11PM +0300, Sergey Miroshnichenko wrote:
> Allow matching them to non-prefetchable windows, as it is done for movable
> resources.

Please make the commit log complete in itself, without requiring the
subject.  It's OK if you have to repeat the subject.

IIUC, this is actually a bug fix and is not strictly related to
movable resources.  We should be able to have a IORESOURCE_PCI_FIXED
prefetchable BAR in a non-prefetchable window.

I suppose movable windows exposes this case because as currently
implemented, it marks many more BARs as IORESOURCE_PCI_FIXED.  I think
we should use something other than IORESOURCE_PCI_FIXED for that case,
so maybe this patch will end up being unnecessary?

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/setup-bus.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 3644feb13179..be7d4e6d7b65 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1301,15 +1301,20 @@ static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
>  {
>  	int i;
>  	struct resource *parent_r;
> -	unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
> -			     IORESOURCE_PREFETCH;
> +	unsigned long mask = IORESOURCE_TYPE_BITS;
>  
>  	pci_bus_for_each_resource(b, parent_r, i) {
>  		if (!parent_r)
>  			continue;
>  
> -		if ((r->flags & mask) == (parent_r->flags & mask) &&
> -		    resource_contains(parent_r, r))
> +		if ((r->flags & mask) != (parent_r->flags & mask))
> +			continue;
> +
> +		if (parent_r->flags & IORESOURCE_PREFETCH &&
> +		    !(r->flags & IORESOURCE_PREFETCH))
> +			continue;
> +
> +		if (resource_contains(parent_r, r))
>  			request_resource(parent_r, r);
>  	}
>  }
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 11/21] PCI: Release and reassign the root bridge resources during rescan
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 20:41   ` Bjorn Helgaas
  2019-03-27 17:40       ` Sergey Miroshnichenko
  -1 siblings, 1 reply; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 20:41 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:12PM +0300, Sergey Miroshnichenko wrote:
> When the movable BARs feature is enabled, don't rely on the memory gaps
> reserved by the BIOS/bootloader/firmware, but instead rearrange the BARs
> and bridge windows starting from the root.
> 
> Endpoint device's BARs, after being released, are resorted and written
> back by the pci_assign_unassigned_root_bus_resources().
> 
> The last step of writing the recalculated windows to the bridges is done
> by the new pci_setup_bridges() function.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/pci.h       |  1 +
>  drivers/pci/probe.c     | 22 ++++++++++++++++++++++
>  drivers/pci/setup-bus.c | 11 ++++++++++-
>  3 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 224d88634115..e06e8692a7b1 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -248,6 +248,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
>  				struct list_head *realloc_head,
>  				struct list_head *fail_head);
>  bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
> +void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
>  
>  void pci_reassigndev_resource_alignment(struct pci_dev *dev);
>  void pci_disable_bridge_window(struct pci_dev *dev);
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 1cf6ec960236..692752c71f71 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3299,6 +3299,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
>  	pm_runtime_put(&bus->dev);
>  }
>  
> +static void pci_setup_bridges(struct pci_bus *bus)
> +{
> +	struct pci_dev *dev;
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		struct pci_bus *child;
> +
> +		if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
> +			continue;
> +
> +		child = dev->subordinate;
> +		if (child)
> +			pci_setup_bridges(child);
> +	}
> +
> +	if (bus->self)
> +		pci_setup_bridge(bus);
> +}
> +
>  /**
>   * pci_rescan_bus - Scan a PCI bus for devices
>   * @bus: PCI bus to scan
> @@ -3321,8 +3340,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
>  		pci_bus_rescan_prepare(root);
>  
>  		max = pci_scan_child_bus(root);
> +
> +		pci_bus_release_root_bridge_resources(root);
>  		pci_assign_unassigned_root_bus_resources(root);
>  
> +		pci_setup_bridges(root);
>  		pci_bus_rescan_done(root);
>  	} else {
>  		max = pci_scan_child_bus(bus);
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index be7d4e6d7b65..36a1907d9509 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1584,7 +1584,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
>  		pci_printk(KERN_DEBUG, dev, "resource %d %pR released\n",
>  					PCI_BRIDGE_RESOURCES + idx, r);
>  		/* keep the old size */
> -		r->end = resource_size(r) - 1;
> +		r->end = pci_movable_bars_enabled() ? 0 : (resource_size(r) - 1);

Doesn't this mean we're throwing away the information about the BAR
size, and we'll have to size the BAR again somewhere?  I would like to
avoid that.  But I don't know yet where you rely on this, so maybe
it's not possible to avoid it.

>  		r->start = 0;
>  		r->flags = 0;
>  
> @@ -1637,6 +1637,15 @@ static void pci_bus_release_bridge_resources(struct pci_bus *bus,
>  		pci_bridge_release_resources(bus, type);
>  }
>  
> +void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
> +{
> +	pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, whole_subtree);
> +	pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, whole_subtree);
> +	pci_bus_release_bridge_resources(root_bus,
> +					 IORESOURCE_MEM_64 | IORESOURCE_PREFETCH,
> +					 whole_subtree);
> +}
> +
>  static void pci_bus_dump_res(struct pci_bus *bus)
>  {
>  	struct resource *res;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 12/21] PCI: Don't allow hotplugged devices to steal resources
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 20:55   ` Bjorn Helgaas
  2019-03-27 18:02       ` Sergey Miroshnichenko
  -1 siblings, 1 reply; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 20:55 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:13PM +0300, Sergey Miroshnichenko wrote:
> When movable BARs are enabled, the PCI subsystem at first releases
> all the bridge windows and then performs an attempt to assign new
> requested resources and re-assign the existing ones.

s/performs an attempt/attempts/

I guess "new requested resources" means "resources to newly hotplugged
devices"?

> If a hotplugged device gets its resources first, there could be no
> space left to re-assign resources of already working devices, which
> is unacceptable. If this happens, this patch marks one of the new
> devices with the new introduced flag PCI_DEV_IGNORE and retries the
> resource assignment.
> 
> This patch adds a new res_mask bitmask to the struct pci_dev for
> storing the indices of assigned resources.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/bus.c       |   5 ++
>  drivers/pci/pci.h       |  11 +++++
>  drivers/pci/probe.c     | 100 +++++++++++++++++++++++++++++++++++++++-
>  drivers/pci/setup-bus.c |  15 ++++++
>  include/linux/pci.h     |   1 +
>  5 files changed, 130 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 5cb40b2518f9..a9784144d6f2 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -311,6 +311,11 @@ void pci_bus_add_device(struct pci_dev *dev)
>  {
>  	int retval;
>  
> +	if (pci_dev_is_ignored(dev)) {
> +		pci_warn(dev, "%s: don't enable the ignored device\n", __func__);
> +		return;

I'm not sure about this.  Even if we're unable to assign space for all
the device's BARs, it still should respond to config accesses, and I
think it should show up in sysfs and lspci.

> +	}
> +
>  	/*
>  	 * Can not put in pci_device_add yet because resources
>  	 * are not assigned yet for some devices.
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index e06e8692a7b1..56b905068ac5 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -366,6 +366,7 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
>  
>  /* pci_dev priv_flags */
>  #define PCI_DEV_ADDED 0
> +#define PCI_DEV_IGNORE 1
>  
>  static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
>  {
> @@ -377,6 +378,16 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
>  	return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
>  }
>  
> +static inline void pci_dev_ignore(struct pci_dev *dev, bool ignore)
> +{
> +	assign_bit(PCI_DEV_IGNORE, &dev->priv_flags, ignore);
> +}
> +
> +static inline bool pci_dev_is_ignored(const struct pci_dev *dev)
> +{
> +	return test_bit(PCI_DEV_IGNORE, &dev->priv_flags);
> +}
> +
>  #ifdef CONFIG_PCIEAER
>  #include <linux/aer.h>
>  
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 692752c71f71..62f4058a001f 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -3248,6 +3248,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>  	return max;
>  }
>  
> +static unsigned int pci_dev_res_mask(struct pci_dev *dev)
> +{
> +	unsigned int res_mask = 0;
> +	int i;
> +
> +	for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
> +		struct resource *r = &dev->resource[i];
> +
> +		if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
> +			continue;
> +
> +		res_mask |= (1 << i);
> +	}
> +
> +	return res_mask;
> +}
> +
>  static void pci_bus_rescan_prepare(struct pci_bus *bus)
>  {
>  	struct pci_dev *dev;
> @@ -3257,6 +3274,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>  		struct pci_bus *child = dev->subordinate;
>  
> +		dev->res_mask = pci_dev_res_mask(dev);
> +
>  		if (child) {
>  			pci_bus_rescan_prepare(child);
>  		} else if (dev->driver &&
> @@ -3318,6 +3337,84 @@ static void pci_setup_bridges(struct pci_bus *bus)
>  		pci_setup_bridge(bus);
>  }
>  
> +static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
> +{
> +	struct pci_dev *dev;
> +
> +	if (!bus)
> +		return NULL;
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		struct pci_bus *child_bus = dev->subordinate;
> +
> +		if (!pci_dev_is_added(dev) && !pci_dev_is_ignored(dev))
> +			return dev;
> +
> +		if (child_bus) {
> +			struct pci_dev *next_new_dev;
> +
> +			next_new_dev = pci_find_next_new_device(child_bus);
> +			if (next_new_dev)
> +				return next_new_dev;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static bool pci_bus_validate_resources(struct pci_bus *bus)

The name of this function should tell us what the return value means.
Just from the name "pci_bus_validate_resources", I can't tell whether we
call it for side-effects, or whether true or false indicates success.

> +{
> +	struct pci_dev *dev;
> +	bool ret = true;
> +
> +	if (!bus)
> +		return false;
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		struct pci_bus *child = dev->subordinate;
> +		unsigned int res_mask = pci_dev_res_mask(dev);
> +
> +		if (pci_dev_is_ignored(dev))
> +			continue;
> +
> +		if (dev->res_mask & ~res_mask) {
> +			pci_err(dev, "%s: Non-re-enabled resources found: 0x%x -> 0x%x\n",
> +				__func__, dev->res_mask, res_mask);

I don't think __func__ really tells users anything useful, so I would
just omit them.  Searching for the text of the message is almost as
good.

> +			ret = false;
> +		}
> +
> +		if (child && !pci_bus_validate_resources(child))
> +			ret = false;
> +	}
> +
> +	return ret;
> +}
> +
> +static void pci_reassign_root_bus_resources(struct pci_bus *root)
> +{
> +	do {
> +		struct pci_dev *next_new_dev;
> +
> +		pci_bus_release_root_bridge_resources(root);
> +		pci_assign_unassigned_root_bus_resources(root);
> +
> +		if (pci_bus_validate_resources(root))
> +			break;
> +
> +		next_new_dev = pci_find_next_new_device(root);
> +		if (!next_new_dev) {
> +			dev_err(&root->dev, "%s: failed to re-assign resources even after ignoring all the hotplugged devices\n",
> +				__func__);
> +			break;
> +		}
> +
> +		dev_warn(&root->dev, "%s: failed to re-assign resources, disable the next hotplugged device %s and retry\n",
> +			 __func__, dev_name(&next_new_dev->dev));
> +
> +		pci_dev_ignore(next_new_dev, true);
> +	} while (true);
> +}
> +
>  /**
>   * pci_rescan_bus - Scan a PCI bus for devices
>   * @bus: PCI bus to scan
> @@ -3341,8 +3438,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
>  
>  		max = pci_scan_child_bus(root);
>  
> -		pci_bus_release_root_bridge_resources(root);
> -		pci_assign_unassigned_root_bus_resources(root);
> +		pci_reassign_root_bus_resources(root);
>  
>  		pci_setup_bridges(root);
>  		pci_bus_rescan_done(root);
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 36a1907d9509..551108f48df7 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -131,6 +131,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
>  {
>  	int i;
>  
> +	if (pci_dev_is_ignored(dev))
> +		return;
> +
>  	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
>  		struct resource *r;
>  		struct pci_dev_resource *dev_res, *tmp;
> @@ -181,6 +184,9 @@ static void __dev_sort_resources(struct pci_dev *dev,
>  {
>  	u16 class = dev->class >> 8;
>  
> +	if (pci_dev_is_ignored(dev))
> +		return;
> +
>  	/* Don't touch classless devices or host bridges or ioapics.  */
>  	if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
>  		return;
> @@ -284,6 +290,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
>  	int idx;
>  
>  	list_for_each_entry(dev_res, head, list) {
> +		if (pci_dev_is_ignored(dev_res->dev))
> +			continue;
> +
>  		res = dev_res->res;
>  		idx = res - &dev_res->dev->resource[0];
>  		if (resource_size(res) &&
> @@ -991,6 +1000,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>  		int i;
>  
> +		if (pci_dev_is_ignored(dev))
> +			continue;
> +
>  		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
>  			struct resource *r = &dev->resource[i];
>  			resource_size_t r_size;
> @@ -1353,6 +1365,9 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
>  	pbus_assign_resources_sorted(bus, realloc_head, fail_head);
>  
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		if (pci_dev_is_ignored(dev))
> +			continue;
> +
>  		pdev_assign_fixed_resources(dev);
>  
>  		b = dev->subordinate;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3d52f5538282..26aa59cb6220 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -369,6 +369,7 @@ struct pci_dev {
>  	 */
>  	unsigned int	irq;
>  	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
> +	unsigned int	res_mask;		/* Bitmask of assigned resources */
>  
>  	bool		match_driver;		/* Skip attaching driver */
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 14/21] PCI: Don't reserve memory for hotplug when enabled movable BARs
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 20:57   ` Bjorn Helgaas
  -1 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 20:57 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:15PM +0300, Sergey Miroshnichenko wrote:
> pbus_size_mem() returns a precise amount of memory required to fit
> all the requested BARs and windows of children bridges.

Please make the commit log complete in itself, without requiring the
subject.

> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/setup-bus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 9d93f2b32bf1..f9d605cd1725 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1229,7 +1229,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
>  
>  	case PCI_HEADER_TYPE_BRIDGE:
>  		pci_bridge_check_ranges(bus);
> -		if (bus->self->is_hotplug_bridge) {
> +		if (bus->self->is_hotplug_bridge && !pci_movable_bars_enabled()) {
>  			additional_io_size  = pci_hotplug_io_size;
>  			additional_mem_size = pci_hotplug_mem_size;
>  		}
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 15/21] PCI: Allow the failed resources to be reassigned later
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 20:58   ` Bjorn Helgaas
  -1 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 20:58 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:16PM +0300, Sergey Miroshnichenko wrote:
> Don't lose the size of the requested EP's BAR if it can't be fit
> in a current trial, so this can be retried.

s/EP/device/, since this applies equally to conventional PCI.

> But a failed bridge window must be dropped and recalculated in the
> next trial.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/setup-bus.c |  3 ++-
>  drivers/pci/setup-res.c | 12 ++++++++++++
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index f9d605cd1725..c1559a4a8564 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -309,7 +309,8 @@ static void assign_requested_resources_sorted(struct list_head *head,
>  						    0 /* don't care */,
>  						    0 /* don't care */);
>  			}
> -			reset_resource(res);
> +			if (!pci_movable_bars_enabled())
> +				reset_resource(res);
>  		}
>  	}
>  }
> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> index d8ca40a97693..732d18f60f1b 100644
> --- a/drivers/pci/setup-res.c
> +++ b/drivers/pci/setup-res.c
> @@ -298,6 +298,18 @@ static int _pci_assign_resource(struct pci_dev *dev, int resno,
>  
>  	bus = dev->bus;
>  	while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) {
> +		if (pci_movable_bars_enabled()) {
> +			if (resno >= PCI_BRIDGE_RESOURCES &&
> +			    resno <= PCI_BRIDGE_RESOURCE_END) {
> +				struct resource *res = dev->resource + resno;
> +
> +				res->start = 0;
> +				res->end = 0;
> +				res->flags = 0;
> +			}
> +			break;
> +		}
> +
>  		if (!bus->parent || !bus->self->transparent)
>  			break;
>  		bus = bus->parent;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 17/21] PCI: Calculate boundaries for bridge windows
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 21:01   ` Bjorn Helgaas
  -1 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 21:01 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:18PM +0300, Sergey Miroshnichenko wrote:
> If a bridge window contains fixed areas (there are PCIe devices with
> immovable BARs located on this bus), 

I think what you mean by "immovable BARs" is "drivers that don't
support moving BARs".  I want to keep the concept of legacy and EA
resources separate because those are immovable in principle, but
drivers can always be improved.

> this window must be allocated
> within the bound memory area, limited by windows size and by address
> range of fixed resources, calculated as follows:
> 
>            | <--     bus's fixed_range_hard   --> |
>   | <--  fixed_range_hard.end - window size   --> |
>            | <--  fixed_range_hard.start + window size   --> |
>   | <--                bus's fixed_range_soft            --> |
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/setup-bus.c | 56 +++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h     |  4 ++-
>  2 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index a1fd7f3c5ea8..f4737339d5ec 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1809,6 +1809,61 @@ static enum enable_type pci_realloc_detect(struct pci_bus *bus,
>  }
>  #endif
>  
> +static void pci_bus_update_fixed_range_soft(struct pci_bus *bus)
> +{
> +	struct pci_dev *dev;
> +	struct pci_bus *parent = bus->parent;
> +	int idx;
> +
> +	list_for_each_entry(dev, &bus->devices, bus_list)
> +		if (dev->subordinate)
> +			pci_bus_update_fixed_range_soft(dev->subordinate);
> +
> +	if (!parent || !bus->self)
> +		return;
> +
> +	for (idx = 0; idx < ARRAY_SIZE(bus->fixed_range_hard); ++idx) {
> +		struct resource *r;
> +		resource_size_t soft_start, soft_end;
> +		resource_size_t hard_start = bus->fixed_range_hard[idx].start;
> +		resource_size_t hard_end = bus->fixed_range_hard[idx].end;
> +
> +		if (hard_start > hard_end)
> +			continue;
> +
> +		r = bus->resource[idx];
> +
> +		soft_start = hard_end - resource_size(r) + 1;
> +		soft_end = hard_start + resource_size(r) - 1;
> +
> +		if (soft_start > hard_start)
> +			soft_start = hard_start;
> +
> +		if (soft_end < hard_end)
> +			soft_end = hard_end;
> +
> +		list_for_each_entry(dev, &parent->devices, bus_list) {
> +			struct pci_bus *sibling = dev->subordinate;
> +			resource_size_t s_start, s_end;
> +
> +			if (!sibling || sibling == bus)
> +				continue;
> +
> +			s_start = sibling->fixed_range_hard[idx].start;
> +			s_end = sibling->fixed_range_hard[idx].end;
> +
> +			if (s_start > s_end)
> +				continue;
> +
> +			if (s_end < hard_start && s_end > soft_start)
> +				soft_start = s_end;
> +		}
> +
> +		bus->fixed_range_soft[idx].start = soft_start;
> +		bus->fixed_range_soft[idx].end = soft_end;
> +	}
> +}
> +
>  /*
>   * first try will not touch pci bridge res
>   * second and later try will clear small leaf bridge res
> @@ -1847,6 +1902,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
>  	/* Depth first, calculate sizes and alignments of all
>  	   subordinate buses. */
>  	__pci_bus_size_bridges(bus, add_list);
> +	pci_bus_update_fixed_range_soft(bus);
>  
>  	/* Depth last, allocate resources and update the hardware. */
>  	__pci_bus_assign_resources(bus, add_list, &fail_head);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 7a4d62d84bc1..75a56db73ad4 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -574,9 +574,11 @@ struct pci_bus {
>  
>  	/*
>  	 * If there are fixed resources in the bridge window, the hard range
> -	 * contains the lowest and the highest addresses of them.
> +	 * contains the lowest and the highest addresses of them, and this
> +	 * bridge window must reside within the soft range.
>  	 */
>  	struct resource fixed_range_hard[PCI_BRIDGE_RESOURCE_NUM];
> +	struct resource fixed_range_soft[PCI_BRIDGE_RESOURCE_NUM];
>  
>  	struct pci_ops	*ops;		/* Configuration access functions */
>  	struct msi_controller *msi;	/* MSI controller */
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 20/21] PCI: pciehp: Add support for the movable BARs feature
  2019-03-11 13:31   ` Sergey Miroshnichenko
  (?)
@ 2019-03-26 21:11   ` Bjorn Helgaas
  -1 siblings, 0 replies; 76+ messages in thread
From: Bjorn Helgaas @ 2019-03-26 21:11 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linuxppc-dev, linux

On Mon, Mar 11, 2019 at 04:31:21PM +0300, Sergey Miroshnichenko wrote:
> With movable BARs, adding a hotplugged device may affect all the PCIe
> domain starting from the root, so use a pci_rescan_bus() function which
> handles the rearrangement of existing BARs and bridge windows.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/hotplug/pciehp_pci.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
> index b9c1396db6fe..7c0871db5bae 100644
> --- a/drivers/pci/hotplug/pciehp_pci.c
> +++ b/drivers/pci/hotplug/pciehp_pci.c
> @@ -56,12 +56,16 @@ int pciehp_configure_device(struct controller *ctrl)
>  		goto out;
>  	}
>  
> -	for_each_pci_bridge(dev, parent)
> -		pci_hp_add_bridge(dev);
> +	if (pci_movable_bars_enabled()) {
> +		pci_rescan_bus(parent);
> +	} else {
> +		for_each_pci_bridge(dev, parent)
> +			pci_hp_add_bridge(dev);
>  
> -	pci_assign_unassigned_bridge_resources(bridge);
> -	pcie_bus_configure_settings(parent);
> -	pci_bus_add_devices(parent);
> +		pci_assign_unassigned_bridge_resources(bridge);
> +		pcie_bus_configure_settings(parent);
> +		pci_bus_add_devices(parent);
> +	}

The addition of a second path at this level, i.e., different paths
depending on whether movable BARs are enabled, seems a little
problematic because it's hard to determine whether they're equivalent
except for the movable BAR aspect.  For example, you don't call
pci_hp_add_bridge() when movable BARs are enabled, and I can't tell
whether that's intentional or whether it's a problem.

This looks like the sort of change that should be made in other
hotplug paths, e.g., enable_slot() for acpiphp,
pcibios_finish_adding_to_bus() for powerpc (maybe? I can't really
tell), cpci_configure_slot() shpchp_configure_device()?

If we have or could invent some top-level interface that all these
places could use, and somewhere inside that we could do the movable
BAR magic, I think that would make it more maintainable.

>   out:
>  	pci_unlock_rescan_remove();
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH RFC v4 09/21] PCI: Mark immovable BARs with PCI_FIXED
  2019-03-26 20:28   ` Bjorn Helgaas
@ 2019-03-27 17:03     ` David Laight
  2019-03-27 17:39       ` Sergey Miroshnichenko
  0 siblings, 1 reply; 76+ messages in thread
From: David Laight @ 2019-03-27 17:03 UTC (permalink / raw)
  To: 'Bjorn Helgaas', Sergey Miroshnichenko
  Cc: linux-pci, linuxppc-dev, linux

From: Bjorn Helgaas
> Sent: 26 March 2019 20:29
> 
> On Mon, Mar 11, 2019 at 04:31:10PM +0300, Sergey Miroshnichenko wrote:
> > If a PCIe device driver doesn't yet have support for movable BARs,
> > mark device's BARs with IORESOURCE_PCI_FIXED.
> 
> I'm hesitant about using IORESOURCE_PCI_FIXED for this purpose.  That
> was originally added to describe resources that can not be changed
> because they're hardwired in the device, e.g., legacy resources and
> Enhanced Allocation resources.
> 
> In general, I think the bits in res->flags should tell us things about
> the hardware.  This particular use would be something about the
> *driver*, and I think we should figure that out by looking at
> dev->driver.

There will also be drivers that don't support BARs being moved,
but may be in a state (ie not actually open) where they can go
through a remove-rescan sequence to allow the BAR be moved.

This might even be true if the open count is non-zero.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device()
  2019-03-26 19:00     ` Bjorn Helgaas
@ 2019-03-27 17:11       ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linuxppc-dev, linux, Srinath Mannam, Marta Rybczynska,
	linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 7632 bytes --]

On 3/26/19 10:00 PM, Bjorn Helgaas wrote:
> [+cc Srinath, Marta, LKML]
> 
> On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote:
>>  CPU0                                      CPU1
>>
>>  pci_enable_device_mem()                   pci_enable_device_mem()
>>    pci_enable_bridge()                       pci_enable_bridge()
>>      pci_is_enabled()
>>        return false;
>>      atomic_inc_return(enable_cnt)
>>      Start actual enabling the bridge
>>      ...                                       pci_is_enabled()
>>      ...                                         return true;
>>      ...                                   Start memory requests <-- FAIL
>>      ...
>>      Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
>>
>> This patch protects the pci_enable/disable_device() and pci_enable_bridge()
>> with mutexes.
> 
> This is a subtle issue that we've tried to fix before, but we've never
> had a satisfactory solution, so I hope you've figured out the right
> fix.
> 
> I'll include some links to previous discussion.  This patch is very
> similar to [2], which we didn't actually apply.  We did apply the
> patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling
> upstream bridges"), but it caused the regressions reported in [4,5],
> so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while
> enabling upstream bridges"").
> 

Thanks for the links, I wasn't aware of these discussions and patches!

On PowerNV this issue is partially hidden by db2173198b95 ("powerpc/powernv/pci: Work
around races in PCI bridge enabling"), and on x86 BIOS pre-initializes all the bridges, so
it doesn't reproduce until hotplugging in a hotplugged bridge.

This patch is indeed similar to 40f11adc7cd9 ("PCI: Avoid race while enabling upstream
bridges"), but instead of a single static mutex it adds per-device mutexes and prevents
the dev->enable_cnt from incrementing too early. So it's not needed anymore to carefully
select a moment safe enough to enable the device.

Serge

> I think the underlying design problem is that we have a driver for
> device B calling pci_enable_device(), and it is changing the state of
> device A (an upstream bridge).  The model generally is that a driver
> should only touch the device it is bound to.
> 
> It's tricky to get the locking right when several children of device A
> all need to operate on A.
> 
> That's all to say I'll have to think carefully about this particular
> patch, so I'll go on to the others and come back to this one.
> 
> Bjorn
> 
> [1] https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>     [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch
> 
> [2] https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>     [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch
> 
> [3] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>     [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
> 
> [4] https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
>     [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream bridges concurrently
> 
> [5] https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231b21@kernel.dk/T/#u
>     iwlwifi firmware load broken in current -git
> 
> [6] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
>     [RFC PATCH] nvme: avoid race-conditions when enabling devices
> 
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
>>  drivers/pci/probe.c |  1 +
>>  include/linux/pci.h |  1 +
>>  3 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index f006068be209..895201d4c9e6 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  	struct pci_dev *bridge;
>>  	int retval;
>>  
>> +	mutex_lock(&dev->enable_mutex);
>> +
>>  	bridge = pci_upstream_bridge(dev);
>>  	if (bridge)
>>  		pci_enable_bridge(bridge);
>> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  	if (pci_is_enabled(dev)) {
>>  		if (!dev->is_busmaster)
>>  			pci_set_master(dev);
>> +		mutex_unlock(&dev->enable_mutex);
>>  		return;
>>  	}
>>  
>> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  		pci_err(dev, "Error enabling bridge (%d), continuing\n",
>>  			retval);
>>  	pci_set_master(dev);
>> +	mutex_unlock(&dev->enable_mutex);
>>  }
>>  
>>  static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  {
>>  	struct pci_dev *bridge;
>> +	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
>> +	bool need_lock = !dev->subordinate;
>>  	int err;
>>  	int i, bars = 0;
>>  
>> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>>  	}
>>  
>> -	if (atomic_inc_return(&dev->enable_cnt) > 1)
>> +	if (need_lock)
>> +		mutex_lock(&dev->enable_mutex);
>> +	if (pci_is_enabled(dev)) {
>> +		if (need_lock)
>> +			mutex_unlock(&dev->enable_mutex);
>>  		return 0;		/* already enabled */
>> +	}
>>  
>>  	bridge = pci_upstream_bridge(dev);
>>  	if (bridge)
>> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  			bars |= (1 << i);
>>  
>>  	err = do_pci_enable_device(dev, bars);
>> -	if (err < 0)
>> -		atomic_dec(&dev->enable_cnt);
>> +	if (err >= 0)
>> +		atomic_inc(&dev->enable_cnt);
>> +	if (need_lock)
>> +		mutex_unlock(&dev->enable_mutex);
>>  	return err;
>>  }
>>  
>> @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev)
>>  	if (dr)
>>  		dr->enabled = 0;
>>  
>> +	mutex_lock(&dev->enable_mutex);
>>  	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
>>  		      "disabling already-disabled device");
>>  
>> -	if (atomic_dec_return(&dev->enable_cnt) != 0)
>> +	if (atomic_dec_return(&dev->enable_cnt) != 0) {
>> +		mutex_unlock(&dev->enable_mutex);
>>  		return;
>> +	}
>>  
>>  	do_pci_disable_device(dev);
>>  
>>  	dev->is_busmaster = 0;
>> +
>> +	mutex_unlock(&dev->enable_mutex);
>>  }
>>  EXPORT_SYMBOL(pci_disable_device);
>>  
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 2ec0df04e0dc..977a127ce791 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
>>  	INIT_LIST_HEAD(&dev->bus_list);
>>  	dev->dev.type = &pci_dev_type;
>>  	dev->bus = pci_bus_get(bus);
>> +	mutex_init(&dev->enable_mutex);
>>  
>>  	return dev;
>>  }
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 77448215ef5b..cb2760a31fe2 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -419,6 +419,7 @@ struct pci_dev {
>>  	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
>>  	pci_dev_flags_t dev_flags;
>>  	atomic_t	enable_cnt;	/* pci_enable_device has been called */
>> +	struct mutex	enable_mutex;
>>  
>>  	u32		saved_config_space[16]; /* Config space saved at suspend time */
>>  	struct hlist_head saved_cap_space;
>> -- 
>> 2.20.1
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device()
@ 2019-03-27 17:11       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marta Rybczynska, linux-pci, linux-kernel, linux, Srinath Mannam,
	linuxppc-dev


[-- Attachment #1.1: Type: text/plain, Size: 7632 bytes --]

On 3/26/19 10:00 PM, Bjorn Helgaas wrote:
> [+cc Srinath, Marta, LKML]
> 
> On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote:
>>  CPU0                                      CPU1
>>
>>  pci_enable_device_mem()                   pci_enable_device_mem()
>>    pci_enable_bridge()                       pci_enable_bridge()
>>      pci_is_enabled()
>>        return false;
>>      atomic_inc_return(enable_cnt)
>>      Start actual enabling the bridge
>>      ...                                       pci_is_enabled()
>>      ...                                         return true;
>>      ...                                   Start memory requests <-- FAIL
>>      ...
>>      Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
>>
>> This patch protects the pci_enable/disable_device() and pci_enable_bridge()
>> with mutexes.
> 
> This is a subtle issue that we've tried to fix before, but we've never
> had a satisfactory solution, so I hope you've figured out the right
> fix.
> 
> I'll include some links to previous discussion.  This patch is very
> similar to [2], which we didn't actually apply.  We did apply the
> patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling
> upstream bridges"), but it caused the regressions reported in [4,5],
> so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while
> enabling upstream bridges"").
> 

Thanks for the links, I wasn't aware of these discussions and patches!

On PowerNV this issue is partially hidden by db2173198b95 ("powerpc/powernv/pci: Work
around races in PCI bridge enabling"), and on x86 BIOS pre-initializes all the bridges, so
it doesn't reproduce until hotplugging in a hotplugged bridge.

This patch is indeed similar to 40f11adc7cd9 ("PCI: Avoid race while enabling upstream
bridges"), but instead of a single static mutex it adds per-device mutexes and prevents
the dev->enable_cnt from incrementing too early. So it's not needed anymore to carefully
select a moment safe enough to enable the device.

Serge

> I think the underlying design problem is that we have a driver for
> device B calling pci_enable_device(), and it is changing the state of
> device A (an upstream bridge).  The model generally is that a driver
> should only touch the device it is bound to.
> 
> It's tricky to get the locking right when several children of device A
> all need to operate on A.
> 
> That's all to say I'll have to think carefully about this particular
> patch, so I'll go on to the others and come back to this one.
> 
> Bjorn
> 
> [1] https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>     [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch
> 
> [2] https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>     [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch
> 
> [3] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>     [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
> 
> [4] https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
>     [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream bridges concurrently
> 
> [5] https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231b21@kernel.dk/T/#u
>     iwlwifi firmware load broken in current -git
> 
> [6] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
>     [RFC PATCH] nvme: avoid race-conditions when enabling devices
> 
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
>>  drivers/pci/probe.c |  1 +
>>  include/linux/pci.h |  1 +
>>  3 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index f006068be209..895201d4c9e6 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  	struct pci_dev *bridge;
>>  	int retval;
>>  
>> +	mutex_lock(&dev->enable_mutex);
>> +
>>  	bridge = pci_upstream_bridge(dev);
>>  	if (bridge)
>>  		pci_enable_bridge(bridge);
>> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  	if (pci_is_enabled(dev)) {
>>  		if (!dev->is_busmaster)
>>  			pci_set_master(dev);
>> +		mutex_unlock(&dev->enable_mutex);
>>  		return;
>>  	}
>>  
>> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  		pci_err(dev, "Error enabling bridge (%d), continuing\n",
>>  			retval);
>>  	pci_set_master(dev);
>> +	mutex_unlock(&dev->enable_mutex);
>>  }
>>  
>>  static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  {
>>  	struct pci_dev *bridge;
>> +	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
>> +	bool need_lock = !dev->subordinate;
>>  	int err;
>>  	int i, bars = 0;
>>  
>> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>>  	}
>>  
>> -	if (atomic_inc_return(&dev->enable_cnt) > 1)
>> +	if (need_lock)
>> +		mutex_lock(&dev->enable_mutex);
>> +	if (pci_is_enabled(dev)) {
>> +		if (need_lock)
>> +			mutex_unlock(&dev->enable_mutex);
>>  		return 0;		/* already enabled */
>> +	}
>>  
>>  	bridge = pci_upstream_bridge(dev);
>>  	if (bridge)
>> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  			bars |= (1 << i);
>>  
>>  	err = do_pci_enable_device(dev, bars);
>> -	if (err < 0)
>> -		atomic_dec(&dev->enable_cnt);
>> +	if (err >= 0)
>> +		atomic_inc(&dev->enable_cnt);
>> +	if (need_lock)
>> +		mutex_unlock(&dev->enable_mutex);
>>  	return err;
>>  }
>>  
>> @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev)
>>  	if (dr)
>>  		dr->enabled = 0;
>>  
>> +	mutex_lock(&dev->enable_mutex);
>>  	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
>>  		      "disabling already-disabled device");
>>  
>> -	if (atomic_dec_return(&dev->enable_cnt) != 0)
>> +	if (atomic_dec_return(&dev->enable_cnt) != 0) {
>> +		mutex_unlock(&dev->enable_mutex);
>>  		return;
>> +	}
>>  
>>  	do_pci_disable_device(dev);
>>  
>>  	dev->is_busmaster = 0;
>> +
>> +	mutex_unlock(&dev->enable_mutex);
>>  }
>>  EXPORT_SYMBOL(pci_disable_device);
>>  
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 2ec0df04e0dc..977a127ce791 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
>>  	INIT_LIST_HEAD(&dev->bus_list);
>>  	dev->dev.type = &pci_dev_type;
>>  	dev->bus = pci_bus_get(bus);
>> +	mutex_init(&dev->enable_mutex);
>>  
>>  	return dev;
>>  }
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 77448215ef5b..cb2760a31fe2 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -419,6 +419,7 @@ struct pci_dev {
>>  	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
>>  	pci_dev_flags_t dev_flags;
>>  	atomic_t	enable_cnt;	/* pci_enable_device has been called */
>> +	struct mutex	enable_mutex;
>>  
>>  	u32		saved_config_space[16]; /* Config space saved at suspend time */
>>  	struct hlist_head saved_cap_space;
>> -- 
>> 2.20.1
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 03/21] PCI: Enable bridge's I/O and MEM access for hotplugged devices
  2019-03-26 19:13   ` Bjorn Helgaas
@ 2019-03-27 17:13       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:13 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linuxppc-dev, linux

On 3/26/19 10:13 PM, Bjorn Helgaas wrote:
> On Mon, Mar 11, 2019 at 04:31:04PM +0300, Sergey Miroshnichenko wrote:
>> After updating the bridge window resources, the PCI_COMMAND_IO and
>> PCI_COMMAND_MEMORY bits of the bridge must be addressed as well.
>>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/pci.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 895201d4c9e6..69898fe5255e 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1622,6 +1622,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  		pci_enable_bridge(bridge);
>>  
>>  	if (pci_is_enabled(dev)) {
>> +		int i, bars = 0;
>> +
>> +		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
>> +			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
>> +				bars |= (1 << i);
>> +		}
>> +		do_pci_enable_device(dev, bars);
> 
> In what situation is this needed, exactly?  This code already exists
> in pci_enable_device_flags().  Why isn't that enough?
> 
> I guess maybe there's some case where we enable the bridge, then
> assign bridge windows, then enable a downstream device?
> 
> Does this fix a bug with current hotplug?
> 

Sure, this change was implemented because of the issue: pci_enable_device_flags() returns
early if the device is already pci_is_enabled(), so if a bridge was already enabled before
the hotplug event, but without MEM and/or IO being set, these bits will not be set even if
a new device wants them.

I've chosen the pci_enable_bridge() for this snippet because it recursively updates all
the parent bridges.

Serge

>>  		if (!dev->is_busmaster)
>>  			pci_set_master(dev);
>>  		mutex_unlock(&dev->enable_mutex);
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 03/21] PCI: Enable bridge's I/O and MEM access for hotplugged devices
@ 2019-03-27 17:13       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:13 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linuxppc-dev, linux

On 3/26/19 10:13 PM, Bjorn Helgaas wrote:
> On Mon, Mar 11, 2019 at 04:31:04PM +0300, Sergey Miroshnichenko wrote:
>> After updating the bridge window resources, the PCI_COMMAND_IO and
>> PCI_COMMAND_MEMORY bits of the bridge must be addressed as well.
>>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/pci.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 895201d4c9e6..69898fe5255e 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1622,6 +1622,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  		pci_enable_bridge(bridge);
>>  
>>  	if (pci_is_enabled(dev)) {
>> +		int i, bars = 0;
>> +
>> +		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
>> +			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
>> +				bars |= (1 << i);
>> +		}
>> +		do_pci_enable_device(dev, bars);
> 
> In what situation is this needed, exactly?  This code already exists
> in pci_enable_device_flags().  Why isn't that enough?
> 
> I guess maybe there's some case where we enable the bridge, then
> assign bridge windows, then enable a downstream device?
> 
> Does this fix a bug with current hotplug?
> 

Sure, this change was implemented because of the issue: pci_enable_device_flags() returns
early if the device is already pci_is_enabled(), so if a bridge was already enabled before
the hotplug event, but without MEM and/or IO being set, these bits will not be set even if
a new device wants them.

I've chosen the pci_enable_bridge() for this snippet because it recursively updates all
the parent bridges.

Serge

>>  		if (!dev->is_busmaster)
>>  			pci_set_master(dev);
>>  		mutex_unlock(&dev->enable_mutex);
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 05/21] PCI: hotplug: Add a flag for the movable BARs feature
  2019-03-26 19:24   ` Bjorn Helgaas
@ 2019-03-27 17:16       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:16 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linuxppc-dev, linux

On 3/26/19 10:24 PM, Bjorn Helgaas wrote:
> On Mon, Mar 11, 2019 at 04:31:06PM +0300, Sergey Miroshnichenko wrote:
>> If a new PCIe device has been hot-plugged between the two active ones
>> without big enough gap between their BARs, 
> 
> Just to speak precisely here, a hot-added device is not "between" two
> active ones because the new device has zeros in its BARs.
> 
> BARs from different devices can be interleaved arbitrarily, subject to
> bridge window constraints, so we can really only speak about a *BAR*
> (not the entire device) being between two other BARs.
> 
> Also, I don't think there's anything here that is PCIe-specific, so we
> should talk about "PCI", not "PCIe".
> 

I agree, that should be rephrased. This patchset intends to solve the problem when a
bridge window is not big enough (or fragmented too much) to fit new BARs, and it can't be
expanded enough because blocked by "neighboring" BARs.

>> these BARs should be moved
>> if their drivers support this feature. The drivers should be notified
>> and paused during the procedure:
>>
>> 1)                 dev 8 (new)
>>                        |
>>                        v
>> .. |  dev 3  |  dev 3  |  dev 5  |  dev 7  |
>> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
>>
>> 2)                             dev 8
>>                                  |
>>                                  v
>> .. |  dev 3  |  dev 3  | -->           --> |  dev 5  |  dev 7  |
>> .. |  BAR 0  |  BAR 1  | -->           --> |  BAR 0  |  BAR 0  |
>>
>>  3)
>>
>> .. |  dev 3  |  dev 3  |  dev 8  |  dev 8  |  dev 5  |  dev 7  |
>> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
>>
>> Thus, prior reservation of memory regions by BIOS/bootloader/firmware
>> is not required anymore for the PCIe hotplug.
>>
>> The PCI_MOVABLE_BARS flag is set by the platform is this feature is
>> supported and tested, but can be overridden by the following command
>> line option:
>>     pcie_movable_bars={ off | force }
> 
> A chicken switch to turn this functionality off is OK, but I think it
> should be enabled by default.  There isn't anything about this that's
> platform-specific, is there?
> 

I'm a bit afraid to suppose that; I was once surprised that bus numbers can't be assigned
arbitrarily on some platforms [1], so probably there are some similar restrictions on BARs
too.

Was going to propose adding pci_add_flags(PCI_MOVABLE_BARS) into arch/.../init.c for
tested platforms, so there will be less upset people with their BARs suddenly broken. But
this logic can be reversed: pci_clear_flags(PCI_MOVABLE_BARS) for platforms where movable
BARs can't work.

Serge

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-September/178103.html

>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  7 ++++++
>>  drivers/pci/pci.c                             | 24 +++++++++++++++++++
>>  include/linux/pci.h                           |  2 ++
>>  3 files changed, 33 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 2b8ee90bb644..d40eaf993f80 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3417,6 +3417,13 @@
>>  		nomsi	Do not use MSI for native PCIe PME signaling (this makes
>>  			all PCIe root ports use INTx for all services).
>>  
>> +	pcie_movable_bars=[PCIE]
>> +			Override the movable BARs support detection:
>> +		off
>> +			Disable even if supported by the platform
>> +		force
>> +			Enable even if not explicitly declared as supported
>> +
>>  	pcmv=		[HW,PCMCIA] BadgePAD 4
>>  
>>  	pd_ignore_unused
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 69898fe5255e..4dac49a887ec 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
>>  }
>>  __setup("pcie_port_pm=", pcie_port_pm_setup);
>>  
>> +static bool pcie_movable_bars_off;
>> +static bool pcie_movable_bars_force;
>> +static int __init pcie_movable_bars_setup(char *str)
>> +{
>> +	if (!strcmp(str, "off"))
>> +		pcie_movable_bars_off = true;
>> +	else if (!strcmp(str, "force"))
>> +		pcie_movable_bars_force = true;
>> +	return 1;
>> +}
>> +__setup("pcie_movable_bars=", pcie_movable_bars_setup);
>> +
>> +bool pci_movable_bars_enabled(void)
>> +{
>> +	if (pcie_movable_bars_off)
>> +		return false;
>> +
>> +	if (pcie_movable_bars_force)
>> +		return true;
>> +
>> +	return pci_has_flag(PCI_MOVABLE_BARS);
>> +}
>> +EXPORT_SYMBOL(pci_movable_bars_enabled);
>> +
>>  /* Time to wait after a reset for device to become responsive */
>>  #define PCIE_RESET_READY_POLL_MS 60000
>>  
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index cb2760a31fe2..cbe661aff9f5 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -866,6 +866,7 @@ enum {
>>  	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
>>  	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
>>  	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
>> +	PCI_MOVABLE_BARS	= 0x00000080,	/* Runtime BAR reassign after hotplug */
>>  };
>>  
>>  /* These external functions are only available when PCI support is enabled */
>> @@ -1345,6 +1346,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>>  void pci_setup_bridge(struct pci_bus *bus);
>>  resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>>  					 unsigned long type);
>> +bool pci_movable_bars_enabled(void);
>>  
>>  #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
>>  #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 05/21] PCI: hotplug: Add a flag for the movable BARs feature
@ 2019-03-27 17:16       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:16 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linuxppc-dev, linux

On 3/26/19 10:24 PM, Bjorn Helgaas wrote:
> On Mon, Mar 11, 2019 at 04:31:06PM +0300, Sergey Miroshnichenko wrote:
>> If a new PCIe device has been hot-plugged between the two active ones
>> without big enough gap between their BARs, 
> 
> Just to speak precisely here, a hot-added device is not "between" two
> active ones because the new device has zeros in its BARs.
> 
> BARs from different devices can be interleaved arbitrarily, subject to
> bridge window constraints, so we can really only speak about a *BAR*
> (not the entire device) being between two other BARs.
> 
> Also, I don't think there's anything here that is PCIe-specific, so we
> should talk about "PCI", not "PCIe".
> 

I agree, that should be rephrased. This patchset intends to solve the problem when a
bridge window is not big enough (or fragmented too much) to fit new BARs, and it can't be
expanded enough because blocked by "neighboring" BARs.

>> these BARs should be moved
>> if their drivers support this feature. The drivers should be notified
>> and paused during the procedure:
>>
>> 1)                 dev 8 (new)
>>                        |
>>                        v
>> .. |  dev 3  |  dev 3  |  dev 5  |  dev 7  |
>> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
>>
>> 2)                             dev 8
>>                                  |
>>                                  v
>> .. |  dev 3  |  dev 3  | -->           --> |  dev 5  |  dev 7  |
>> .. |  BAR 0  |  BAR 1  | -->           --> |  BAR 0  |  BAR 0  |
>>
>>  3)
>>
>> .. |  dev 3  |  dev 3  |  dev 8  |  dev 8  |  dev 5  |  dev 7  |
>> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
>>
>> Thus, prior reservation of memory regions by BIOS/bootloader/firmware
>> is not required anymore for the PCIe hotplug.
>>
>> The PCI_MOVABLE_BARS flag is set by the platform is this feature is
>> supported and tested, but can be overridden by the following command
>> line option:
>>     pcie_movable_bars={ off | force }
> 
> A chicken switch to turn this functionality off is OK, but I think it
> should be enabled by default.  There isn't anything about this that's
> platform-specific, is there?
> 

I'm a bit afraid to suppose that; I was once surprised that bus numbers can't be assigned
arbitrarily on some platforms [1], so probably there are some similar restrictions on BARs
too.

Was going to propose adding pci_add_flags(PCI_MOVABLE_BARS) into arch/.../init.c for
tested platforms, so there will be less upset people with their BARs suddenly broken. But
this logic can be reversed: pci_clear_flags(PCI_MOVABLE_BARS) for platforms where movable
BARs can't work.

Serge

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-September/178103.html

>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  7 ++++++
>>  drivers/pci/pci.c                             | 24 +++++++++++++++++++
>>  include/linux/pci.h                           |  2 ++
>>  3 files changed, 33 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 2b8ee90bb644..d40eaf993f80 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3417,6 +3417,13 @@
>>  		nomsi	Do not use MSI for native PCIe PME signaling (this makes
>>  			all PCIe root ports use INTx for all services).
>>  
>> +	pcie_movable_bars=[PCIE]
>> +			Override the movable BARs support detection:
>> +		off
>> +			Disable even if supported by the platform
>> +		force
>> +			Enable even if not explicitly declared as supported
>> +
>>  	pcmv=		[HW,PCMCIA] BadgePAD 4
>>  
>>  	pd_ignore_unused
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 69898fe5255e..4dac49a887ec 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
>>  }
>>  __setup("pcie_port_pm=", pcie_port_pm_setup);
>>  
>> +static bool pcie_movable_bars_off;
>> +static bool pcie_movable_bars_force;
>> +static int __init pcie_movable_bars_setup(char *str)
>> +{
>> +	if (!strcmp(str, "off"))
>> +		pcie_movable_bars_off = true;
>> +	else if (!strcmp(str, "force"))
>> +		pcie_movable_bars_force = true;
>> +	return 1;
>> +}
>> +__setup("pcie_movable_bars=", pcie_movable_bars_setup);
>> +
>> +bool pci_movable_bars_enabled(void)
>> +{
>> +	if (pcie_movable_bars_off)
>> +		return false;
>> +
>> +	if (pcie_movable_bars_force)
>> +		return true;
>> +
>> +	return pci_has_flag(PCI_MOVABLE_BARS);
>> +}
>> +EXPORT_SYMBOL(pci_movable_bars_enabled);
>> +
>>  /* Time to wait after a reset for device to become responsive */
>>  #define PCIE_RESET_READY_POLL_MS 60000
>>  
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index cb2760a31fe2..cbe661aff9f5 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -866,6 +866,7 @@ enum {
>>  	PCI_ENABLE_PROC_DOMAINS	= 0x00000010,	/* Enable domains in /proc */
>>  	PCI_COMPAT_DOMAIN_0	= 0x00000020,	/* ... except domain 0 */
>>  	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
>> +	PCI_MOVABLE_BARS	= 0x00000080,	/* Runtime BAR reassign after hotplug */
>>  };
>>  
>>  /* These external functions are only available when PCI support is enabled */
>> @@ -1345,6 +1346,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
>>  void pci_setup_bridge(struct pci_bus *bus);
>>  resource_size_t pcibios_window_alignment(struct pci_bus *bus,
>>  					 unsigned long type);
>> +bool pci_movable_bars_enabled(void);
>>  
>>  #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
>>  #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs
  2019-03-26 20:20     ` Bjorn Helgaas
  (?)
@ 2019-03-27 17:30       ` Sergey Miroshnichenko
  -1 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linuxppc-dev, linux, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, linux-kernel

On 3/26/19 11:20 PM, Bjorn Helgaas wrote:
> [+cc Keith, Jens, Christoph, Sagi, linux-nvme, LKML]
> 
> On Mon, Mar 11, 2019 at 04:31:09PM +0300, Sergey Miroshnichenko wrote:
>> Hotplugged devices can affect the existing ones by moving their BARs.
>> PCI subsystem will inform the NVME driver about this by invoking
>> reset_prepare()+reset_done(), then iounmap()+ioremap() must be called.
> 
> Do you mean the PCI core will invoke ->rescan_prepare() and
> ->rescan_done() (as opposed to *reset*)?
> 

Yes, of course, sorry for the confusion!

These are new callbacks, so drivers can explicitly show their support of movable BARs, and
the PCI core can detect if they don't and note that the corresponding BARs can't be moved
for now.

>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/nvme/host/pci.c | 29 +++++++++++++++++++++++++++--
>>  1 file changed, 27 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 92bad1c810ac..ccea3033a67a 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -106,6 +106,7 @@ struct nvme_dev {
>>  	unsigned int num_vecs;
>>  	int q_depth;
>>  	u32 db_stride;
>> +	resource_size_t current_phys_bar;
>>  	void __iomem *bar;
>>  	unsigned long bar_mapped_size;
>>  	struct work_struct remove_work;
>> @@ -1672,13 +1673,16 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
>>  {
>>  	struct pci_dev *pdev = to_pci_dev(dev->dev);
>>  
>> -	if (size <= dev->bar_mapped_size)
>> +	if (dev->bar &&
>> +	    dev->current_phys_bar == pci_resource_start(pdev, 0) &&
>> +	    size <= dev->bar_mapped_size)
>>  		return 0;
>>  	if (size > pci_resource_len(pdev, 0))
>>  		return -ENOMEM;
>>  	if (dev->bar)
>>  		iounmap(dev->bar);
>> -	dev->bar = ioremap(pci_resource_start(pdev, 0), size);
>> +	dev->current_phys_bar = pci_resource_start(pdev, 0);
>> +	dev->bar = ioremap(dev->current_phys_bar, size);
> 
> dev->current_phys_bar is different from pci_resource_start() in the
> case where the PCI core has moved the nvme BAR, but nvme has not yet
> remapped it.
> 
> I'm not sure it's worth keeping track of current_phys_bar, as opposed
> to always unmapping and remapping.  Is this a performance path?  I
> think there are advantages to always exercising the same code path,
> regardless of whether the BAR happened to be moved, e.g., if there's a
> bug in the "BAR moved" path, it may be a heisenbug because whether we
> exercise that path depends on the current configuration.
> 
> If you do need to cache current_phys_bar, maybe this, so it's a little
> easier to see that you're not changing the ioremap() itself:
> 
>   dev->bar = ioremap(pci_resource_start(pdev, 0), size);
>   dev->current_phys_bar = pci_resource_start(pdev, 0);
> 

Oh, I see now. Rescan is rather a rare event, and unconditional remapping is simpler, so a
bit more resistant to bugs.

>>  	if (!dev->bar) {
>>  		dev->bar_mapped_size = 0;
>>  		return -ENOMEM;
>> @@ -2504,6 +2508,8 @@ static void nvme_reset_work(struct work_struct *work)
>>  	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>>  		goto out;
>>  
>> +	nvme_remap_bar(dev, db_bar_size(dev, 0));
> 
> How is this change connected to rescan?  This looks reset-related.
> 

Thanks for catching that! This has also slipped form early stage of this pathset, when
reset_done() (which is rescan_done() now) just initiated an NVME reset.

Best regards,
Serge

>>  	/*
>>  	 * If we're called to reset a live controller first shut it down before
>>  	 * moving on.
>> @@ -2910,6 +2916,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
>>  	flush_work(&dev->ctrl.reset_work);
>>  }
>>  
>> +void nvme_rescan_prepare(struct pci_dev *pdev)
>> +{
>> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
>> +
>> +	nvme_dev_disable(dev, false);
>> +	nvme_dev_unmap(dev);
>> +	dev->bar = NULL;
>> +}
>> +
>> +void nvme_rescan_done(struct pci_dev *pdev)
>> +{
>> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
>> +
>> +	nvme_dev_map(dev);
>> +	nvme_reset_ctrl_sync(&dev->ctrl);
>> +}
>> +
>>  static const struct pci_error_handlers nvme_err_handler = {
>>  	.error_detected	= nvme_error_detected,
>>  	.slot_reset	= nvme_slot_reset,
>> @@ -2974,6 +2997,8 @@ static struct pci_driver nvme_driver = {
>>  	},
>>  	.sriov_configure = pci_sriov_configure_simple,
>>  	.err_handler	= &nvme_err_handler,
>> +	.rescan_prepare	= nvme_rescan_prepare,
>> +	.rescan_done	= nvme_rescan_done,
>>  };
>>  
>>  static int __init nvme_init(void)
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs
@ 2019-03-27 17:30       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:30 UTC (permalink / raw)


On 3/26/19 11:20 PM, Bjorn Helgaas wrote:
> [+cc Keith, Jens, Christoph, Sagi, linux-nvme, LKML]
> 
> On Mon, Mar 11, 2019@04:31:09PM +0300, Sergey Miroshnichenko wrote:
>> Hotplugged devices can affect the existing ones by moving their BARs.
>> PCI subsystem will inform the NVME driver about this by invoking
>> reset_prepare()+reset_done(), then iounmap()+ioremap() must be called.
> 
> Do you mean the PCI core will invoke ->rescan_prepare() and
> ->rescan_done() (as opposed to *reset*)?
> 

Yes, of course, sorry for the confusion!

These are new callbacks, so drivers can explicitly show their support of movable BARs, and
the PCI core can detect if they don't and note that the corresponding BARs can't be moved
for now.

>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko at yadro.com>
>> ---
>>  drivers/nvme/host/pci.c | 29 +++++++++++++++++++++++++++--
>>  1 file changed, 27 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 92bad1c810ac..ccea3033a67a 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -106,6 +106,7 @@ struct nvme_dev {
>>  	unsigned int num_vecs;
>>  	int q_depth;
>>  	u32 db_stride;
>> +	resource_size_t current_phys_bar;
>>  	void __iomem *bar;
>>  	unsigned long bar_mapped_size;
>>  	struct work_struct remove_work;
>> @@ -1672,13 +1673,16 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
>>  {
>>  	struct pci_dev *pdev = to_pci_dev(dev->dev);
>>  
>> -	if (size <= dev->bar_mapped_size)
>> +	if (dev->bar &&
>> +	    dev->current_phys_bar == pci_resource_start(pdev, 0) &&
>> +	    size <= dev->bar_mapped_size)
>>  		return 0;
>>  	if (size > pci_resource_len(pdev, 0))
>>  		return -ENOMEM;
>>  	if (dev->bar)
>>  		iounmap(dev->bar);
>> -	dev->bar = ioremap(pci_resource_start(pdev, 0), size);
>> +	dev->current_phys_bar = pci_resource_start(pdev, 0);
>> +	dev->bar = ioremap(dev->current_phys_bar, size);
> 
> dev->current_phys_bar is different from pci_resource_start() in the
> case where the PCI core has moved the nvme BAR, but nvme has not yet
> remapped it.
> 
> I'm not sure it's worth keeping track of current_phys_bar, as opposed
> to always unmapping and remapping.  Is this a performance path?  I
> think there are advantages to always exercising the same code path,
> regardless of whether the BAR happened to be moved, e.g., if there's a
> bug in the "BAR moved" path, it may be a heisenbug because whether we
> exercise that path depends on the current configuration.
> 
> If you do need to cache current_phys_bar, maybe this, so it's a little
> easier to see that you're not changing the ioremap() itself:
> 
>   dev->bar = ioremap(pci_resource_start(pdev, 0), size);
>   dev->current_phys_bar = pci_resource_start(pdev, 0);
> 

Oh, I see now. Rescan is rather a rare event, and unconditional remapping is simpler, so a
bit more resistant to bugs.

>>  	if (!dev->bar) {
>>  		dev->bar_mapped_size = 0;
>>  		return -ENOMEM;
>> @@ -2504,6 +2508,8 @@ static void nvme_reset_work(struct work_struct *work)
>>  	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>>  		goto out;
>>  
>> +	nvme_remap_bar(dev, db_bar_size(dev, 0));
> 
> How is this change connected to rescan?  This looks reset-related.
> 

Thanks for catching that! This has also slipped form early stage of this pathset, when
reset_done() (which is rescan_done() now) just initiated an NVME reset.

Best regards,
Serge

>>  	/*
>>  	 * If we're called to reset a live controller first shut it down before
>>  	 * moving on.
>> @@ -2910,6 +2916,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
>>  	flush_work(&dev->ctrl.reset_work);
>>  }
>>  
>> +void nvme_rescan_prepare(struct pci_dev *pdev)
>> +{
>> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
>> +
>> +	nvme_dev_disable(dev, false);
>> +	nvme_dev_unmap(dev);
>> +	dev->bar = NULL;
>> +}
>> +
>> +void nvme_rescan_done(struct pci_dev *pdev)
>> +{
>> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
>> +
>> +	nvme_dev_map(dev);
>> +	nvme_reset_ctrl_sync(&dev->ctrl);
>> +}
>> +
>>  static const struct pci_error_handlers nvme_err_handler = {
>>  	.error_detected	= nvme_error_detected,
>>  	.slot_reset	= nvme_slot_reset,
>> @@ -2974,6 +2997,8 @@ static struct pci_driver nvme_driver = {
>>  	},
>>  	.sriov_configure = pci_sriov_configure_simple,
>>  	.err_handler	= &nvme_err_handler,
>> +	.rescan_prepare	= nvme_rescan_prepare,
>> +	.rescan_done	= nvme_rescan_done,
>>  };
>>  
>>  static int __init nvme_init(void)
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs
@ 2019-03-27 17:30       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jens Axboe, Sagi Grimberg, linux-pci, linux-kernel, linux-nvme,
	linux, Keith Busch, linuxppc-dev, Christoph Hellwig

On 3/26/19 11:20 PM, Bjorn Helgaas wrote:
> [+cc Keith, Jens, Christoph, Sagi, linux-nvme, LKML]
> 
> On Mon, Mar 11, 2019 at 04:31:09PM +0300, Sergey Miroshnichenko wrote:
>> Hotplugged devices can affect the existing ones by moving their BARs.
>> PCI subsystem will inform the NVME driver about this by invoking
>> reset_prepare()+reset_done(), then iounmap()+ioremap() must be called.
> 
> Do you mean the PCI core will invoke ->rescan_prepare() and
> ->rescan_done() (as opposed to *reset*)?
> 

Yes, of course, sorry for the confusion!

These are new callbacks, so drivers can explicitly show their support of movable BARs, and
the PCI core can detect if they don't and note that the corresponding BARs can't be moved
for now.

>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/nvme/host/pci.c | 29 +++++++++++++++++++++++++++--
>>  1 file changed, 27 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 92bad1c810ac..ccea3033a67a 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -106,6 +106,7 @@ struct nvme_dev {
>>  	unsigned int num_vecs;
>>  	int q_depth;
>>  	u32 db_stride;
>> +	resource_size_t current_phys_bar;
>>  	void __iomem *bar;
>>  	unsigned long bar_mapped_size;
>>  	struct work_struct remove_work;
>> @@ -1672,13 +1673,16 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
>>  {
>>  	struct pci_dev *pdev = to_pci_dev(dev->dev);
>>  
>> -	if (size <= dev->bar_mapped_size)
>> +	if (dev->bar &&
>> +	    dev->current_phys_bar == pci_resource_start(pdev, 0) &&
>> +	    size <= dev->bar_mapped_size)
>>  		return 0;
>>  	if (size > pci_resource_len(pdev, 0))
>>  		return -ENOMEM;
>>  	if (dev->bar)
>>  		iounmap(dev->bar);
>> -	dev->bar = ioremap(pci_resource_start(pdev, 0), size);
>> +	dev->current_phys_bar = pci_resource_start(pdev, 0);
>> +	dev->bar = ioremap(dev->current_phys_bar, size);
> 
> dev->current_phys_bar is different from pci_resource_start() in the
> case where the PCI core has moved the nvme BAR, but nvme has not yet
> remapped it.
> 
> I'm not sure it's worth keeping track of current_phys_bar, as opposed
> to always unmapping and remapping.  Is this a performance path?  I
> think there are advantages to always exercising the same code path,
> regardless of whether the BAR happened to be moved, e.g., if there's a
> bug in the "BAR moved" path, it may be a heisenbug because whether we
> exercise that path depends on the current configuration.
> 
> If you do need to cache current_phys_bar, maybe this, so it's a little
> easier to see that you're not changing the ioremap() itself:
> 
>   dev->bar = ioremap(pci_resource_start(pdev, 0), size);
>   dev->current_phys_bar = pci_resource_start(pdev, 0);
> 

Oh, I see now. Rescan is rather a rare event, and unconditional remapping is simpler, so a
bit more resistant to bugs.

>>  	if (!dev->bar) {
>>  		dev->bar_mapped_size = 0;
>>  		return -ENOMEM;
>> @@ -2504,6 +2508,8 @@ static void nvme_reset_work(struct work_struct *work)
>>  	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING))
>>  		goto out;
>>  
>> +	nvme_remap_bar(dev, db_bar_size(dev, 0));
> 
> How is this change connected to rescan?  This looks reset-related.
> 

Thanks for catching that! This has also slipped form early stage of this pathset, when
reset_done() (which is rescan_done() now) just initiated an NVME reset.

Best regards,
Serge

>>  	/*
>>  	 * If we're called to reset a live controller first shut it down before
>>  	 * moving on.
>> @@ -2910,6 +2916,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
>>  	flush_work(&dev->ctrl.reset_work);
>>  }
>>  
>> +void nvme_rescan_prepare(struct pci_dev *pdev)
>> +{
>> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
>> +
>> +	nvme_dev_disable(dev, false);
>> +	nvme_dev_unmap(dev);
>> +	dev->bar = NULL;
>> +}
>> +
>> +void nvme_rescan_done(struct pci_dev *pdev)
>> +{
>> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
>> +
>> +	nvme_dev_map(dev);
>> +	nvme_reset_ctrl_sync(&dev->ctrl);
>> +}
>> +
>>  static const struct pci_error_handlers nvme_err_handler = {
>>  	.error_detected	= nvme_error_detected,
>>  	.slot_reset	= nvme_slot_reset,
>> @@ -2974,6 +2997,8 @@ static struct pci_driver nvme_driver = {
>>  	},
>>  	.sriov_configure = pci_sriov_configure_simple,
>>  	.err_handler	= &nvme_err_handler,
>> +	.rescan_prepare	= nvme_rescan_prepare,
>> +	.rescan_done	= nvme_rescan_done,
>>  };
>>  
>>  static int __init nvme_init(void)
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 09/21] PCI: Mark immovable BARs with PCI_FIXED
  2019-03-27 17:03     ` David Laight
@ 2019-03-27 17:39       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:39 UTC (permalink / raw)
  To: David Laight, 'Bjorn Helgaas'; +Cc: linux-pci, linuxppc-dev, linux

On 3/27/19 8:03 PM, David Laight wrote:
> From: Bjorn Helgaas
>> Sent: 26 March 2019 20:29
>>
>> On Mon, Mar 11, 2019 at 04:31:10PM +0300, Sergey Miroshnichenko wrote:
>>> If a PCIe device driver doesn't yet have support for movable BARs,
>>> mark device's BARs with IORESOURCE_PCI_FIXED.
>>
>> I'm hesitant about using IORESOURCE_PCI_FIXED for this purpose.  That
>> was originally added to describe resources that can not be changed
>> because they're hardwired in the device, e.g., legacy resources and
>> Enhanced Allocation resources.
>>
>> In general, I think the bits in res->flags should tell us things about
>> the hardware.  This particular use would be something about the
>> *driver*, and I think we should figure that out by looking at
>> dev->driver.
> 
> There will also be drivers that don't support BARs being moved,
> but may be in a state (ie not actually open) where they can go
> through a remove-rescan sequence to allow the BAR be moved.
> 
> This might even be true if the open count is non-zero.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

This approach with IORESOURCE_PCI_FIXED was used because struct resource doesn't have a
pointer to its device (and so to its driver). But now, after you have mentioned that, I
can see that in every place I use the FIXED flag to mark the immovable resources - also
has the according struct pci_dev *dev nearby.

So, replacing every

    if (r->flags & IORESOURCE_PCI_FIXED)

with

    if (!dev->driver->rescan_prepare)

or something like

    if (pci_dev_movable_bars_capable(dev))

will reduce this huge patchset a little, and also makes irrelevant the case I've
completely forgotten about - IORESOURCE_PCI_FIXED must be unset on removing (rmmod) the
"immovable" driver.

Thanks a lot! I'll rework the changes in this way and resend it as v5.

Serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 11/21] PCI: Release and reassign the root bridge resources during rescan
  2019-03-26 20:41   ` Bjorn Helgaas
@ 2019-03-27 17:40       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:40 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linuxppc-dev, linux

On 3/26/19 11:41 PM, Bjorn Helgaas wrote:
> On Mon, Mar 11, 2019 at 04:31:12PM +0300, Sergey Miroshnichenko wrote:
>> When the movable BARs feature is enabled, don't rely on the memory gaps
>> reserved by the BIOS/bootloader/firmware, but instead rearrange the BARs
>> and bridge windows starting from the root.
>>
>> Endpoint device's BARs, after being released, are resorted and written
>> back by the pci_assign_unassigned_root_bus_resources().
>>
>> The last step of writing the recalculated windows to the bridges is done
>> by the new pci_setup_bridges() function.
>>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/pci.h       |  1 +
>>  drivers/pci/probe.c     | 22 ++++++++++++++++++++++
>>  drivers/pci/setup-bus.c | 11 ++++++++++-
>>  3 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index 224d88634115..e06e8692a7b1 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -248,6 +248,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
>>  				struct list_head *realloc_head,
>>  				struct list_head *fail_head);
>>  bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
>> +void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
>>  
>>  void pci_reassigndev_resource_alignment(struct pci_dev *dev);
>>  void pci_disable_bridge_window(struct pci_dev *dev);
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 1cf6ec960236..692752c71f71 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -3299,6 +3299,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
>>  	pm_runtime_put(&bus->dev);
>>  }
>>  
>> +static void pci_setup_bridges(struct pci_bus *bus)
>> +{
>> +	struct pci_dev *dev;
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child;
>> +
>> +		if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
>> +			continue;
>> +
>> +		child = dev->subordinate;
>> +		if (child)
>> +			pci_setup_bridges(child);
>> +	}
>> +
>> +	if (bus->self)
>> +		pci_setup_bridge(bus);
>> +}
>> +
>>  /**
>>   * pci_rescan_bus - Scan a PCI bus for devices
>>   * @bus: PCI bus to scan
>> @@ -3321,8 +3340,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
>>  		pci_bus_rescan_prepare(root);
>>  
>>  		max = pci_scan_child_bus(root);
>> +
>> +		pci_bus_release_root_bridge_resources(root);
>>  		pci_assign_unassigned_root_bus_resources(root);
>>  
>> +		pci_setup_bridges(root);
>>  		pci_bus_rescan_done(root);
>>  	} else {
>>  		max = pci_scan_child_bus(bus);
>> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> index be7d4e6d7b65..36a1907d9509 100644
>> --- a/drivers/pci/setup-bus.c
>> +++ b/drivers/pci/setup-bus.c
>> @@ -1584,7 +1584,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
>>  		pci_printk(KERN_DEBUG, dev, "resource %d %pR released\n",
>>  					PCI_BRIDGE_RESOURCES + idx, r);
>>  		/* keep the old size */
>> -		r->end = resource_size(r) - 1;
>> +		r->end = pci_movable_bars_enabled() ? 0 : (resource_size(r) - 1);
> 
> Doesn't this mean we're throwing away the information about the BAR
> size, and we'll have to size the BAR again somewhere?  I would like to
> avoid that.  But I don't know yet where you rely on this, so maybe
> it's not possible to avoid it.
> 

This resource is not a BAR, but a bridge window, I'm freeing it intentionally, so
pbus_size_mem() can later recalculate a new size.

Serge

>>  		r->start = 0;
>>  		r->flags = 0;
>>  
>> @@ -1637,6 +1637,15 @@ static void pci_bus_release_bridge_resources(struct pci_bus *bus,
>>  		pci_bridge_release_resources(bus, type);
>>  }
>>  
>> +void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
>> +{
>> +	pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, whole_subtree);
>> +	pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, whole_subtree);
>> +	pci_bus_release_bridge_resources(root_bus,
>> +					 IORESOURCE_MEM_64 | IORESOURCE_PREFETCH,
>> +					 whole_subtree);
>> +}
>> +
>>  static void pci_bus_dump_res(struct pci_bus *bus)
>>  {
>>  	struct resource *res;
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 11/21] PCI: Release and reassign the root bridge resources during rescan
@ 2019-03-27 17:40       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 17:40 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linuxppc-dev, linux

On 3/26/19 11:41 PM, Bjorn Helgaas wrote:
> On Mon, Mar 11, 2019 at 04:31:12PM +0300, Sergey Miroshnichenko wrote:
>> When the movable BARs feature is enabled, don't rely on the memory gaps
>> reserved by the BIOS/bootloader/firmware, but instead rearrange the BARs
>> and bridge windows starting from the root.
>>
>> Endpoint device's BARs, after being released, are resorted and written
>> back by the pci_assign_unassigned_root_bus_resources().
>>
>> The last step of writing the recalculated windows to the bridges is done
>> by the new pci_setup_bridges() function.
>>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/pci.h       |  1 +
>>  drivers/pci/probe.c     | 22 ++++++++++++++++++++++
>>  drivers/pci/setup-bus.c | 11 ++++++++++-
>>  3 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index 224d88634115..e06e8692a7b1 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -248,6 +248,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
>>  				struct list_head *realloc_head,
>>  				struct list_head *fail_head);
>>  bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
>> +void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
>>  
>>  void pci_reassigndev_resource_alignment(struct pci_dev *dev);
>>  void pci_disable_bridge_window(struct pci_dev *dev);
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 1cf6ec960236..692752c71f71 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -3299,6 +3299,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
>>  	pm_runtime_put(&bus->dev);
>>  }
>>  
>> +static void pci_setup_bridges(struct pci_bus *bus)
>> +{
>> +	struct pci_dev *dev;
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child;
>> +
>> +		if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
>> +			continue;
>> +
>> +		child = dev->subordinate;
>> +		if (child)
>> +			pci_setup_bridges(child);
>> +	}
>> +
>> +	if (bus->self)
>> +		pci_setup_bridge(bus);
>> +}
>> +
>>  /**
>>   * pci_rescan_bus - Scan a PCI bus for devices
>>   * @bus: PCI bus to scan
>> @@ -3321,8 +3340,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
>>  		pci_bus_rescan_prepare(root);
>>  
>>  		max = pci_scan_child_bus(root);
>> +
>> +		pci_bus_release_root_bridge_resources(root);
>>  		pci_assign_unassigned_root_bus_resources(root);
>>  
>> +		pci_setup_bridges(root);
>>  		pci_bus_rescan_done(root);
>>  	} else {
>>  		max = pci_scan_child_bus(bus);
>> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> index be7d4e6d7b65..36a1907d9509 100644
>> --- a/drivers/pci/setup-bus.c
>> +++ b/drivers/pci/setup-bus.c
>> @@ -1584,7 +1584,7 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
>>  		pci_printk(KERN_DEBUG, dev, "resource %d %pR released\n",
>>  					PCI_BRIDGE_RESOURCES + idx, r);
>>  		/* keep the old size */
>> -		r->end = resource_size(r) - 1;
>> +		r->end = pci_movable_bars_enabled() ? 0 : (resource_size(r) - 1);
> 
> Doesn't this mean we're throwing away the information about the BAR
> size, and we'll have to size the BAR again somewhere?  I would like to
> avoid that.  But I don't know yet where you rely on this, so maybe
> it's not possible to avoid it.
> 

This resource is not a BAR, but a bridge window, I'm freeing it intentionally, so
pbus_size_mem() can later recalculate a new size.

Serge

>>  		r->start = 0;
>>  		r->flags = 0;
>>  
>> @@ -1637,6 +1637,15 @@ static void pci_bus_release_bridge_resources(struct pci_bus *bus,
>>  		pci_bridge_release_resources(bus, type);
>>  }
>>  
>> +void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
>> +{
>> +	pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, whole_subtree);
>> +	pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, whole_subtree);
>> +	pci_bus_release_bridge_resources(root_bus,
>> +					 IORESOURCE_MEM_64 | IORESOURCE_PREFETCH,
>> +					 whole_subtree);
>> +}
>> +
>>  static void pci_bus_dump_res(struct pci_bus *bus)
>>  {
>>  	struct resource *res;
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 12/21] PCI: Don't allow hotplugged devices to steal resources
  2019-03-26 20:55   ` Bjorn Helgaas
@ 2019-03-27 18:02       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 18:02 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linuxppc-dev, linux

On 3/26/19 11:55 PM, Bjorn Helgaas wrote:
> On Mon, Mar 11, 2019 at 04:31:13PM +0300, Sergey Miroshnichenko wrote:
>> When movable BARs are enabled, the PCI subsystem at first releases
>> all the bridge windows and then performs an attempt to assign new
>> requested resources and re-assign the existing ones.
> 
> s/performs an attempt/attempts/
> 
> I guess "new requested resources" means "resources to newly hotplugged
> devices"?
> 

Yes, that's exactly what I've tried to express :) Will rephrase that in v5.

>> If a hotplugged device gets its resources first, there could be no
>> space left to re-assign resources of already working devices, which
>> is unacceptable. If this happens, this patch marks one of the new
>> devices with the new introduced flag PCI_DEV_IGNORE and retries the
>> resource assignment.
>>
>> This patch adds a new res_mask bitmask to the struct pci_dev for
>> storing the indices of assigned resources.
>>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/bus.c       |   5 ++
>>  drivers/pci/pci.h       |  11 +++++
>>  drivers/pci/probe.c     | 100 +++++++++++++++++++++++++++++++++++++++-
>>  drivers/pci/setup-bus.c |  15 ++++++
>>  include/linux/pci.h     |   1 +
>>  5 files changed, 130 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>> index 5cb40b2518f9..a9784144d6f2 100644
>> --- a/drivers/pci/bus.c
>> +++ b/drivers/pci/bus.c
>> @@ -311,6 +311,11 @@ void pci_bus_add_device(struct pci_dev *dev)
>>  {
>>  	int retval;
>>  
>> +	if (pci_dev_is_ignored(dev)) {
>> +		pci_warn(dev, "%s: don't enable the ignored device\n", __func__);
>> +		return;
> 
> I'm not sure about this.  Even if we're unable to assign space for all
> the device's BARs, it still should respond to config accesses, and I
> think it should show up in sysfs and lspci.
> 

I agree, that would be better.

Also, this patch introduces a new issue to think about: how to recover BARs for such
devices when their neighbors was removed and it's enough space now.

>> +	}
>> +
>>  	/*
>>  	 * Can not put in pci_device_add yet because resources
>>  	 * are not assigned yet for some devices.
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index e06e8692a7b1..56b905068ac5 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -366,6 +366,7 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
>>  
>>  /* pci_dev priv_flags */
>>  #define PCI_DEV_ADDED 0
>> +#define PCI_DEV_IGNORE 1
>>  
>>  static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
>>  {
>> @@ -377,6 +378,16 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
>>  	return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
>>  }
>>  
>> +static inline void pci_dev_ignore(struct pci_dev *dev, bool ignore)
>> +{
>> +	assign_bit(PCI_DEV_IGNORE, &dev->priv_flags, ignore);
>> +}
>> +
>> +static inline bool pci_dev_is_ignored(const struct pci_dev *dev)
>> +{
>> +	return test_bit(PCI_DEV_IGNORE, &dev->priv_flags);
>> +}
>> +
>>  #ifdef CONFIG_PCIEAER
>>  #include <linux/aer.h>
>>  
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 692752c71f71..62f4058a001f 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -3248,6 +3248,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>>  	return max;
>>  }
>>  
>> +static unsigned int pci_dev_res_mask(struct pci_dev *dev)
>> +{
>> +	unsigned int res_mask = 0;
>> +	int i;
>> +
>> +	for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
>> +		struct resource *r = &dev->resource[i];
>> +
>> +		if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
>> +			continue;
>> +
>> +		res_mask |= (1 << i);
>> +	}
>> +
>> +	return res_mask;
>> +}
>> +
>>  static void pci_bus_rescan_prepare(struct pci_bus *bus)
>>  {
>>  	struct pci_dev *dev;
>> @@ -3257,6 +3274,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
>>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>>  		struct pci_bus *child = dev->subordinate;
>>  
>> +		dev->res_mask = pci_dev_res_mask(dev);
>> +
>>  		if (child) {
>>  			pci_bus_rescan_prepare(child);
>>  		} else if (dev->driver &&
>> @@ -3318,6 +3337,84 @@ static void pci_setup_bridges(struct pci_bus *bus)
>>  		pci_setup_bridge(bus);
>>  }
>>  
>> +static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
>> +{
>> +	struct pci_dev *dev;
>> +
>> +	if (!bus)
>> +		return NULL;
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child_bus = dev->subordinate;
>> +
>> +		if (!pci_dev_is_added(dev) && !pci_dev_is_ignored(dev))
>> +			return dev;
>> +
>> +		if (child_bus) {
>> +			struct pci_dev *next_new_dev;
>> +
>> +			next_new_dev = pci_find_next_new_device(child_bus);
>> +			if (next_new_dev)
>> +				return next_new_dev;
>> +		}
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static bool pci_bus_validate_resources(struct pci_bus *bus)
> 
> The name of this function should tell us what the return value means.
> Just from the name "pci_bus_validate_resources", I can't tell whether we
> call it for side-effects, or whether true or false indicates success.
> 

Sure, now I realize this too. Would the pci_bus_check_all_bars_reassigned() be better choice?

>> +{
>> +	struct pci_dev *dev;
>> +	bool ret = true;
>> +
>> +	if (!bus)
>> +		return false;
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child = dev->subordinate;
>> +		unsigned int res_mask = pci_dev_res_mask(dev);
>> +
>> +		if (pci_dev_is_ignored(dev))
>> +			continue;
>> +
>> +		if (dev->res_mask & ~res_mask) {
>> +			pci_err(dev, "%s: Non-re-enabled resources found: 0x%x -> 0x%x\n",
>> +				__func__, dev->res_mask, res_mask);
> 
> I don't think __func__ really tells users anything useful, so I would
> just omit them.  Searching for the text of the message is almost as
> good.
> 

Ok, I'll drop __func__'s.

Serge

>> +			ret = false;
>> +		}
>> +
>> +		if (child && !pci_bus_validate_resources(child))
>> +			ret = false;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static void pci_reassign_root_bus_resources(struct pci_bus *root)
>> +{
>> +	do {
>> +		struct pci_dev *next_new_dev;
>> +
>> +		pci_bus_release_root_bridge_resources(root);
>> +		pci_assign_unassigned_root_bus_resources(root);
>> +
>> +		if (pci_bus_validate_resources(root))
>> +			break;
>> +
>> +		next_new_dev = pci_find_next_new_device(root);
>> +		if (!next_new_dev) {
>> +			dev_err(&root->dev, "%s: failed to re-assign resources even after ignoring all the hotplugged devices\n",
>> +				__func__);
>> +			break;
>> +		}
>> +
>> +		dev_warn(&root->dev, "%s: failed to re-assign resources, disable the next hotplugged device %s and retry\n",
>> +			 __func__, dev_name(&next_new_dev->dev));
>> +
>> +		pci_dev_ignore(next_new_dev, true);
>> +	} while (true);
>> +}
>> +
>>  /**
>>   * pci_rescan_bus - Scan a PCI bus for devices
>>   * @bus: PCI bus to scan
>> @@ -3341,8 +3438,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
>>  
>>  		max = pci_scan_child_bus(root);
>>  
>> -		pci_bus_release_root_bridge_resources(root);
>> -		pci_assign_unassigned_root_bus_resources(root);
>> +		pci_reassign_root_bus_resources(root);
>>  
>>  		pci_setup_bridges(root);
>>  		pci_bus_rescan_done(root);
>> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> index 36a1907d9509..551108f48df7 100644
>> --- a/drivers/pci/setup-bus.c
>> +++ b/drivers/pci/setup-bus.c
>> @@ -131,6 +131,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
>>  {
>>  	int i;
>>  
>> +	if (pci_dev_is_ignored(dev))
>> +		return;
>> +
>>  	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
>>  		struct resource *r;
>>  		struct pci_dev_resource *dev_res, *tmp;
>> @@ -181,6 +184,9 @@ static void __dev_sort_resources(struct pci_dev *dev,
>>  {
>>  	u16 class = dev->class >> 8;
>>  
>> +	if (pci_dev_is_ignored(dev))
>> +		return;
>> +
>>  	/* Don't touch classless devices or host bridges or ioapics.  */
>>  	if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
>>  		return;
>> @@ -284,6 +290,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
>>  	int idx;
>>  
>>  	list_for_each_entry(dev_res, head, list) {
>> +		if (pci_dev_is_ignored(dev_res->dev))
>> +			continue;
>> +
>>  		res = dev_res->res;
>>  		idx = res - &dev_res->dev->resource[0];
>>  		if (resource_size(res) &&
>> @@ -991,6 +1000,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
>>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>>  		int i;
>>  
>> +		if (pci_dev_is_ignored(dev))
>> +			continue;
>> +
>>  		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
>>  			struct resource *r = &dev->resource[i];
>>  			resource_size_t r_size;
>> @@ -1353,6 +1365,9 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
>>  	pbus_assign_resources_sorted(bus, realloc_head, fail_head);
>>  
>>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		if (pci_dev_is_ignored(dev))
>> +			continue;
>> +
>>  		pdev_assign_fixed_resources(dev);
>>  
>>  		b = dev->subordinate;
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 3d52f5538282..26aa59cb6220 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -369,6 +369,7 @@ struct pci_dev {
>>  	 */
>>  	unsigned int	irq;
>>  	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
>> +	unsigned int	res_mask;		/* Bitmask of assigned resources */
>>  
>>  	bool		match_driver;		/* Skip attaching driver */
>>  
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH RFC v4 12/21] PCI: Don't allow hotplugged devices to steal resources
@ 2019-03-27 18:02       ` Sergey Miroshnichenko
  0 siblings, 0 replies; 76+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-27 18:02 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linuxppc-dev, linux

On 3/26/19 11:55 PM, Bjorn Helgaas wrote:
> On Mon, Mar 11, 2019 at 04:31:13PM +0300, Sergey Miroshnichenko wrote:
>> When movable BARs are enabled, the PCI subsystem at first releases
>> all the bridge windows and then performs an attempt to assign new
>> requested resources and re-assign the existing ones.
> 
> s/performs an attempt/attempts/
> 
> I guess "new requested resources" means "resources to newly hotplugged
> devices"?
> 

Yes, that's exactly what I've tried to express :) Will rephrase that in v5.

>> If a hotplugged device gets its resources first, there could be no
>> space left to re-assign resources of already working devices, which
>> is unacceptable. If this happens, this patch marks one of the new
>> devices with the new introduced flag PCI_DEV_IGNORE and retries the
>> resource assignment.
>>
>> This patch adds a new res_mask bitmask to the struct pci_dev for
>> storing the indices of assigned resources.
>>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
>> ---
>>  drivers/pci/bus.c       |   5 ++
>>  drivers/pci/pci.h       |  11 +++++
>>  drivers/pci/probe.c     | 100 +++++++++++++++++++++++++++++++++++++++-
>>  drivers/pci/setup-bus.c |  15 ++++++
>>  include/linux/pci.h     |   1 +
>>  5 files changed, 130 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>> index 5cb40b2518f9..a9784144d6f2 100644
>> --- a/drivers/pci/bus.c
>> +++ b/drivers/pci/bus.c
>> @@ -311,6 +311,11 @@ void pci_bus_add_device(struct pci_dev *dev)
>>  {
>>  	int retval;
>>  
>> +	if (pci_dev_is_ignored(dev)) {
>> +		pci_warn(dev, "%s: don't enable the ignored device\n", __func__);
>> +		return;
> 
> I'm not sure about this.  Even if we're unable to assign space for all
> the device's BARs, it still should respond to config accesses, and I
> think it should show up in sysfs and lspci.
> 

I agree, that would be better.

Also, this patch introduces a new issue to think about: how to recover BARs for such
devices when their neighbors was removed and it's enough space now.

>> +	}
>> +
>>  	/*
>>  	 * Can not put in pci_device_add yet because resources
>>  	 * are not assigned yet for some devices.
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index e06e8692a7b1..56b905068ac5 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -366,6 +366,7 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
>>  
>>  /* pci_dev priv_flags */
>>  #define PCI_DEV_ADDED 0
>> +#define PCI_DEV_IGNORE 1
>>  
>>  static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
>>  {
>> @@ -377,6 +378,16 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
>>  	return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
>>  }
>>  
>> +static inline void pci_dev_ignore(struct pci_dev *dev, bool ignore)
>> +{
>> +	assign_bit(PCI_DEV_IGNORE, &dev->priv_flags, ignore);
>> +}
>> +
>> +static inline bool pci_dev_is_ignored(const struct pci_dev *dev)
>> +{
>> +	return test_bit(PCI_DEV_IGNORE, &dev->priv_flags);
>> +}
>> +
>>  #ifdef CONFIG_PCIEAER
>>  #include <linux/aer.h>
>>  
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 692752c71f71..62f4058a001f 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -3248,6 +3248,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>>  	return max;
>>  }
>>  
>> +static unsigned int pci_dev_res_mask(struct pci_dev *dev)
>> +{
>> +	unsigned int res_mask = 0;
>> +	int i;
>> +
>> +	for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
>> +		struct resource *r = &dev->resource[i];
>> +
>> +		if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
>> +			continue;
>> +
>> +		res_mask |= (1 << i);
>> +	}
>> +
>> +	return res_mask;
>> +}
>> +
>>  static void pci_bus_rescan_prepare(struct pci_bus *bus)
>>  {
>>  	struct pci_dev *dev;
>> @@ -3257,6 +3274,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
>>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>>  		struct pci_bus *child = dev->subordinate;
>>  
>> +		dev->res_mask = pci_dev_res_mask(dev);
>> +
>>  		if (child) {
>>  			pci_bus_rescan_prepare(child);
>>  		} else if (dev->driver &&
>> @@ -3318,6 +3337,84 @@ static void pci_setup_bridges(struct pci_bus *bus)
>>  		pci_setup_bridge(bus);
>>  }
>>  
>> +static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
>> +{
>> +	struct pci_dev *dev;
>> +
>> +	if (!bus)
>> +		return NULL;
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child_bus = dev->subordinate;
>> +
>> +		if (!pci_dev_is_added(dev) && !pci_dev_is_ignored(dev))
>> +			return dev;
>> +
>> +		if (child_bus) {
>> +			struct pci_dev *next_new_dev;
>> +
>> +			next_new_dev = pci_find_next_new_device(child_bus);
>> +			if (next_new_dev)
>> +				return next_new_dev;
>> +		}
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static bool pci_bus_validate_resources(struct pci_bus *bus)
> 
> The name of this function should tell us what the return value means.
> Just from the name "pci_bus_validate_resources", I can't tell whether we
> call it for side-effects, or whether true or false indicates success.
> 

Sure, now I realize this too. Would the pci_bus_check_all_bars_reassigned() be better choice?

>> +{
>> +	struct pci_dev *dev;
>> +	bool ret = true;
>> +
>> +	if (!bus)
>> +		return false;
>> +
>> +	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		struct pci_bus *child = dev->subordinate;
>> +		unsigned int res_mask = pci_dev_res_mask(dev);
>> +
>> +		if (pci_dev_is_ignored(dev))
>> +			continue;
>> +
>> +		if (dev->res_mask & ~res_mask) {
>> +			pci_err(dev, "%s: Non-re-enabled resources found: 0x%x -> 0x%x\n",
>> +				__func__, dev->res_mask, res_mask);
> 
> I don't think __func__ really tells users anything useful, so I would
> just omit them.  Searching for the text of the message is almost as
> good.
> 

Ok, I'll drop __func__'s.

Serge

>> +			ret = false;
>> +		}
>> +
>> +		if (child && !pci_bus_validate_resources(child))
>> +			ret = false;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static void pci_reassign_root_bus_resources(struct pci_bus *root)
>> +{
>> +	do {
>> +		struct pci_dev *next_new_dev;
>> +
>> +		pci_bus_release_root_bridge_resources(root);
>> +		pci_assign_unassigned_root_bus_resources(root);
>> +
>> +		if (pci_bus_validate_resources(root))
>> +			break;
>> +
>> +		next_new_dev = pci_find_next_new_device(root);
>> +		if (!next_new_dev) {
>> +			dev_err(&root->dev, "%s: failed to re-assign resources even after ignoring all the hotplugged devices\n",
>> +				__func__);
>> +			break;
>> +		}
>> +
>> +		dev_warn(&root->dev, "%s: failed to re-assign resources, disable the next hotplugged device %s and retry\n",
>> +			 __func__, dev_name(&next_new_dev->dev));
>> +
>> +		pci_dev_ignore(next_new_dev, true);
>> +	} while (true);
>> +}
>> +
>>  /**
>>   * pci_rescan_bus - Scan a PCI bus for devices
>>   * @bus: PCI bus to scan
>> @@ -3341,8 +3438,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
>>  
>>  		max = pci_scan_child_bus(root);
>>  
>> -		pci_bus_release_root_bridge_resources(root);
>> -		pci_assign_unassigned_root_bus_resources(root);
>> +		pci_reassign_root_bus_resources(root);
>>  
>>  		pci_setup_bridges(root);
>>  		pci_bus_rescan_done(root);
>> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> index 36a1907d9509..551108f48df7 100644
>> --- a/drivers/pci/setup-bus.c
>> +++ b/drivers/pci/setup-bus.c
>> @@ -131,6 +131,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
>>  {
>>  	int i;
>>  
>> +	if (pci_dev_is_ignored(dev))
>> +		return;
>> +
>>  	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
>>  		struct resource *r;
>>  		struct pci_dev_resource *dev_res, *tmp;
>> @@ -181,6 +184,9 @@ static void __dev_sort_resources(struct pci_dev *dev,
>>  {
>>  	u16 class = dev->class >> 8;
>>  
>> +	if (pci_dev_is_ignored(dev))
>> +		return;
>> +
>>  	/* Don't touch classless devices or host bridges or ioapics.  */
>>  	if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
>>  		return;
>> @@ -284,6 +290,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
>>  	int idx;
>>  
>>  	list_for_each_entry(dev_res, head, list) {
>> +		if (pci_dev_is_ignored(dev_res->dev))
>> +			continue;
>> +
>>  		res = dev_res->res;
>>  		idx = res - &dev_res->dev->resource[0];
>>  		if (resource_size(res) &&
>> @@ -991,6 +1000,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
>>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>>  		int i;
>>  
>> +		if (pci_dev_is_ignored(dev))
>> +			continue;
>> +
>>  		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
>>  			struct resource *r = &dev->resource[i];
>>  			resource_size_t r_size;
>> @@ -1353,6 +1365,9 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
>>  	pbus_assign_resources_sorted(bus, realloc_head, fail_head);
>>  
>>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>> +		if (pci_dev_is_ignored(dev))
>> +			continue;
>> +
>>  		pdev_assign_fixed_resources(dev);
>>  
>>  		b = dev->subordinate;
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 3d52f5538282..26aa59cb6220 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -369,6 +369,7 @@ struct pci_dev {
>>  	 */
>>  	unsigned int	irq;
>>  	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
>> +	unsigned int	res_mask;		/* Bitmask of assigned resources */
>>  
>>  	bool		match_driver;		/* Skip attaching driver */
>>  
>> -- 
>> 2.20.1
>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2019-03-27 19:34 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-11 13:31 [PATCH RFC v4 00/21] PCI: Allow BAR movement during hotplug Sergey Miroshnichenko
2019-03-11 13:31 ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 01/21] PCI: Fix writing invalid BARs during pci_restore_state() Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 14:02   ` Bjorn Helgaas
2019-03-11 13:31 ` [PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device() Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 19:00   ` Bjorn Helgaas
2019-03-26 19:00     ` Bjorn Helgaas
2019-03-27 17:11     ` Sergey Miroshnichenko
2019-03-27 17:11       ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 03/21] PCI: Enable bridge's I/O and MEM access for hotplugged devices Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 19:13   ` Bjorn Helgaas
2019-03-27 17:13     ` Sergey Miroshnichenko
2019-03-27 17:13       ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 04/21] PCI: Define PCI-specific version of the release_child_resources() Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 05/21] PCI: hotplug: Add a flag for the movable BARs feature Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 19:24   ` Bjorn Helgaas
2019-03-27 17:16     ` Sergey Miroshnichenko
2019-03-27 17:16       ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 06/21] PCI: Pause the devices with movable BARs during rescan Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 07/21] PCI: Wake up bridges during rescan when movable BARs enabled Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 19:28   ` Bjorn Helgaas
2019-03-11 13:31 ` [PATCH RFC v4 08/21] nvme-pci: Handle movable BARs Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 20:20   ` Bjorn Helgaas
2019-03-26 20:20     ` Bjorn Helgaas
2019-03-26 20:20     ` Bjorn Helgaas
2019-03-27 17:30     ` Sergey Miroshnichenko
2019-03-27 17:30       ` Sergey Miroshnichenko
2019-03-27 17:30       ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 09/21] PCI: Mark immovable BARs with PCI_FIXED Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 20:28   ` Bjorn Helgaas
2019-03-27 17:03     ` David Laight
2019-03-27 17:39       ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 10/21] PCI: Fix assigning of fixed prefetchable resources Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 20:37   ` Bjorn Helgaas
2019-03-11 13:31 ` [PATCH RFC v4 11/21] PCI: Release and reassign the root bridge resources during rescan Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 20:41   ` Bjorn Helgaas
2019-03-27 17:40     ` Sergey Miroshnichenko
2019-03-27 17:40       ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 12/21] PCI: Don't allow hotplugged devices to steal resources Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 20:55   ` Bjorn Helgaas
2019-03-27 18:02     ` Sergey Miroshnichenko
2019-03-27 18:02       ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 13/21] PCI: Include fixed BARs into the bus size calculating Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 14/21] PCI: Don't reserve memory for hotplug when enabled movable BARs Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 20:57   ` Bjorn Helgaas
2019-03-11 13:31 ` [PATCH RFC v4 15/21] PCI: Allow the failed resources to be reassigned later Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 20:58   ` Bjorn Helgaas
2019-03-11 13:31 ` [PATCH RFC v4 16/21] PCI: Calculate fixed areas of bridge windows based on fixed BARs Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 17/21] PCI: Calculate boundaries for bridge windows Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 21:01   ` Bjorn Helgaas
2019-03-11 13:31 ` [PATCH RFC v4 18/21] PCI: Make sure bridge windows include their fixed BARs Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 19/21] PCI: Prioritize fixed BAR assigning over the movable ones Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-11 13:31 ` [PATCH RFC v4 20/21] PCI: pciehp: Add support for the movable BARs feature Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko
2019-03-26 21:11   ` Bjorn Helgaas
2019-03-11 13:31 ` [PATCH RFC v4 21/21] powerpc/pci: Fix crash with enabled movable BARs Sergey Miroshnichenko
2019-03-11 13:31   ` Sergey Miroshnichenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.