All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/6] Update device MPS
@ 2013-08-22  3:24 Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 1/6] PCI: Drop "PCI-E" prefix from Max Payload Size message Yijing Wang
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Yijing Wang @ 2013-08-22  3:24 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jon Mason, linux-pci, Hanjun Guo, jiang.liu, Yijing Wang

v7->v8: Yijing update the patch 6/6, and tested this series in hotplug machine.
v6->v7: Bjorn's rework and additions of other minor cleanups
v5->v6: rework the patch 1/2, remove the unnecessary check, pointed out by Bjorn.
        remove the patch 1/2 cc stable tag, because it's not a serious bug.
v4->v5: Fix some spelling problems and move mpss = 128 << dev->pcie_mpss above to reuse
        it, also remove the else braces for code style. thanks for Jon's review and comments.
v3->v4: Call pcie_bus_update_set() only when pcie_bus_config == PCIE_BUS_TUNE_OFF
        suggested by Jon Mason, try to change parent mps when parent device is
        root port and only one slot connected to it when parent mps > child device
        mpss. Other add a patch to fix a issue in pcie_find_smpss() during use
        "pci=pcie_bus_safe".
v2->v3: Update CC stable tag suggested by Li Zefan.
v1->v2: Update patch log, remove Joe's reported-by, because his problem
        was mainly caused by BIOS incorrect setting. But this patch mainly
        to fix the bug caused by device hot add. Conservatively, this
        version only update the mps problem when hot add. When the device
        mps < parent mps found, this patch try to update device mps.
        It seems unlikely device mps > parent mps after hot add device.
        So we don't care that situation.

1. test with append "pci=pcie_bus_safe" during system bootup
-+-[0000:40]-+-00.0-[0000:41]--
 |           +-05.0-[0000:45]--
 |           +-07.0-[0000:46]--+-00.0  Intel Corporation 82576 Gigabit Network Connection
 |           |                 \-00.1  Intel Corporation 82576 Gigabit Network Connection

root port(40:07.0 mps=256, mpss=256), NIC devices(46:00.0/1, mps=256, mpss=512)

linux-ha2:/sys/bus/pci/slots/7 # echo 0 > power 
linux-ha2:/sys/bus/pci/slots/7 # echo 1 > power 
linux-ha2:/sys/bus/pci/slots/7 # dmesg
....................................
pcieport 0000:40:07.0: Max Payload Size set to  256/ 256 (was  256), Max Read Rq  128 
pci 0000:46:00.0: Max Payload Size set to  256/ 512 (was  128), Max Read Rq  512 
pci 0000:46:00.1: Max Payload Size set to  256/ 512 (was  128), Max Read Rq  512 
pcieport 0000:40:07.0: Max Payload Size set to  256/ 256 (was  256), Max Read Rq  128 
pci 0000:46:00.0: Max Payload Size set to  256/ 512 (was  256), Max Read Rq  512 
pci 0000:46:00.1: Max Payload Size set to  256/ 512 (was  256), Max Read Rq  512 
...................................


2. test without append "pci=pcie_bus_xxx", default PCIE_BUS_TUNE_OFF
root port(40:07.0 mps=256, mpss=256), NIC devices(46:00.0/1, mps=256, mpss=512)
linux-ha2:/sys/bus/pci/slots/7 # echo 0 > power
root port, mps =256, NIC devices, mps = 128
linux-ha2:/sys/bus/pci/slots/7 # echo 1 > power 
root port, mps =256, NIC devices, mps = 256

all test results ok.

Bjorn Helgaas (3):
  PCI: Drop "PCI-E" prefix from Max Payload Size message
  PCI: Simplify pcie_bus_configure_settings() interface
  PCI: Simplify MPS test for Downstream Port

Yijing Wang (3):
  PCI: Remove unnecessary check for pcie_get_mps() failure
  PCI: Don't restrict MPS for slots below Root Ports
  PCI: update device mps when doing pci hotplug

 arch/powerpc/kernel/pci-common.c |    8 +---
 arch/tile/kernel/pci_gx.c        |    9 +---
 arch/x86/pci/acpi.c              |    9 +---
 drivers/pci/hotplug/pcihp_slot.c |    5 +-
 drivers/pci/pci.c                |    3 -
 drivers/pci/probe.c              |   90 +++++++++++++++++++++++++++++---------
 include/linux/pci.h              |    2 +-
 7 files changed, 78 insertions(+), 48 deletions(-)



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v8 1/6] PCI: Drop "PCI-E" prefix from Max Payload Size message
  2013-08-22  3:24 [PATCH v8 0/6] Update device MPS Yijing Wang
@ 2013-08-22  3:24 ` Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 2/6] PCI: Simplify pcie_bus_configure_settings() interface Yijing Wang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2013-08-22  3:24 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jon Mason, linux-pci, Hanjun Guo, jiang.liu

From: Bjorn Helgaas <bhelgaas@google.com>

The conventional spelling is "PCIe", but I think even that is superfluous,
so remove the whole thing.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/probe.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index cf57fe7..1fa9e5e 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1596,7 +1596,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
 	pcie_write_mps(dev, mps);
 	pcie_write_mrrs(dev);
 
-	dev_info(&dev->dev, "PCI-E Max Payload Size set to %4d/%4d (was %4d), "
+	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d), "
 		 "Max Read Rq %4d\n", pcie_get_mps(dev), 128 << dev->pcie_mpss,
 		 orig_mps, pcie_get_readrq(dev));
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v8 2/6] PCI: Simplify pcie_bus_configure_settings() interface
  2013-08-22  3:24 [PATCH v8 0/6] Update device MPS Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 1/6] PCI: Drop "PCI-E" prefix from Max Payload Size message Yijing Wang
@ 2013-08-22  3:24 ` Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 3/6] PCI: Remove unnecessary check for pcie_get_mps() failure Yijing Wang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2013-08-22  3:24 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jon Mason, linux-pci, Hanjun Guo, jiang.liu

From: Bjorn Helgaas <bhelgaas@google.com>

Based on a patch by Jon Mason (see URL below).

All users of pcie_bus_configure_settings() pass arguments of the form
"bus, bus->self->pcie_mpss".  The "mpss" argument is redundant since we
can easily look it up internally.  In addition, all callers check
"bus->self" for NULL, which we can also do internally.

This patch simplifies the interface and the callers.  No functional change.

Reference: http://lkml.kernel.org/r/1317048850-30728-2-git-send-email-mason@myri.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 arch/powerpc/kernel/pci-common.c |    8 ++------
 arch/tile/kernel/pci_gx.c        |    9 ++-------
 arch/x86/pci/acpi.c              |    9 ++-------
 drivers/pci/hotplug/pcihp_slot.c |    5 ++---
 drivers/pci/probe.c              |    7 +++++--
 include/linux/pci.h              |    2 +-
 6 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index f46914a..d35ec34 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1672,12 +1672,8 @@ void pcibios_scan_phb(struct pci_controller *hose)
 	/* Configure PCI Express settings */
 	if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
 		struct pci_bus *child;
-		list_for_each_entry(child, &bus->children, node) {
-			struct pci_dev *self = child->self;
-			if (!self)
-				continue;
-			pcie_bus_configure_settings(child, self->pcie_mpss);
-		}
+		list_for_each_entry(child, &bus->children, node)
+			pcie_bus_configure_settings(child);
 	}
 }
 
diff --git a/arch/tile/kernel/pci_gx.c b/arch/tile/kernel/pci_gx.c
index 1142563..6640e7b 100644
--- a/arch/tile/kernel/pci_gx.c
+++ b/arch/tile/kernel/pci_gx.c
@@ -508,13 +508,8 @@ static void fixup_read_and_payload_sizes(struct pci_controller *controller)
 						rc_dev_cap.word);
 
 	/* Configure PCI Express MPS setting. */
-	list_for_each_entry(child, &root_bus->children, node) {
-		struct pci_dev *self = child->self;
-		if (!self)
-			continue;
-
-		pcie_bus_configure_settings(child, self->pcie_mpss);
-	}
+	list_for_each_entry(child, &root_bus->children, node)
+		pcie_bus_configure_settings(child);
 
 	/*
 	 * Set the mac_config register in trio based on the MPS/MRS of the link.
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index d641897..b30e937 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -568,13 +568,8 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
 	 */
 	if (bus) {
 		struct pci_bus *child;
-		list_for_each_entry(child, &bus->children, node) {
-			struct pci_dev *self = child->self;
-			if (!self)
-				continue;
-
-			pcie_bus_configure_settings(child, self->pcie_mpss);
-		}
+		list_for_each_entry(child, &bus->children, node)
+			pcie_bus_configure_settings(child);
 	}
 
 	if (bus && node != -1) {
diff --git a/drivers/pci/hotplug/pcihp_slot.c b/drivers/pci/hotplug/pcihp_slot.c
index fec2d5b..16f9203 100644
--- a/drivers/pci/hotplug/pcihp_slot.c
+++ b/drivers/pci/hotplug/pcihp_slot.c
@@ -160,9 +160,8 @@ void pci_configure_slot(struct pci_dev *dev)
 			(dev->class >> 8) == PCI_CLASS_BRIDGE_PCI)))
 		return;
 
-	if (dev->bus && dev->bus->self)
-		pcie_bus_configure_settings(dev->bus,
-					    dev->bus->self->pcie_mpss);
+	if (dev->bus)
+		pcie_bus_configure_settings(dev->bus);
 
 	memset(&hpp, 0, sizeof(hpp));
 	ret = pci_get_hp_params(dev, &hpp);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1fa9e5e..627cc88 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1607,10 +1607,13 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
  * parents then children fashion.  If this changes, then this code will not
  * work as designed.
  */
-void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
+void pcie_bus_configure_settings(struct pci_bus *bus)
 {
 	u8 smpss;
 
+	if (!bus->self)
+		return;
+
 	if (!pci_is_pcie(bus->self))
 		return;
 
@@ -1625,7 +1628,7 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
 		smpss = 0;
 
 	if (pcie_bus_config == PCIE_BUS_SAFE) {
-		smpss = mpss;
+		smpss = bus->self->pcie_mpss;
 
 		pcie_find_smpss(bus->self, &smpss);
 		pci_walk_bus(bus, pcie_find_smpss, &smpss);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 408f047..13ee987 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -675,7 +675,7 @@ struct pci_driver {
 /* these external functions are only available when PCI support is enabled */
 #ifdef CONFIG_PCI
 
-void pcie_bus_configure_settings(struct pci_bus *bus, u8 smpss);
+void pcie_bus_configure_settings(struct pci_bus *bus);
 
 enum pcie_bus_config_types {
 	PCIE_BUS_TUNE_OFF,
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v8 3/6] PCI: Remove unnecessary check for pcie_get_mps() failure
  2013-08-22  3:24 [PATCH v8 0/6] Update device MPS Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 1/6] PCI: Drop "PCI-E" prefix from Max Payload Size message Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 2/6] PCI: Simplify pcie_bus_configure_settings() interface Yijing Wang
@ 2013-08-22  3:24 ` Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 4/6] PCI: Simplify MPS test for Downstream Port Yijing Wang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2013-08-22  3:24 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jon Mason, linux-pci, Hanjun Guo, jiang.liu, Yijing Wang

After 59875ae489 ("PCI/core: Use PCI Express Capability accessors"),
pcie_get_mps() never returns an error, so don't bother to check for it.

No functional change.

[bhelgaas: changelog, fix pcie_get_mps() doc]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 10e3c4e..9cdba6a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3919,8 +3919,6 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
 	if (pcie_bus_config == PCIE_BUS_PERFORMANCE) {
 		int mps = pcie_get_mps(dev);
 
-		if (mps < 0)
-			return mps;
 		if (mps < rq)
 			rq = mps;
 	}
@@ -3937,7 +3935,6 @@ EXPORT_SYMBOL(pcie_set_readrq);
  * @dev: PCI device to query
  *
  * Returns maximum payload size in bytes
- *    or appropriate error value.
  */
 int pcie_get_mps(struct pci_dev *dev)
 {
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v8 4/6] PCI: Simplify MPS test for Downstream Port
  2013-08-22  3:24 [PATCH v8 0/6] Update device MPS Yijing Wang
                   ` (2 preceding siblings ...)
  2013-08-22  3:24 ` [PATCH v8 3/6] PCI: Remove unnecessary check for pcie_get_mps() failure Yijing Wang
@ 2013-08-22  3:24 ` Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 5/6] PCI: Don't restrict MPS for slots below Root Ports Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 6/6] PCI: update device mps when doing pci hotplug Yijing Wang
  5 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2013-08-22  3:24 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jon Mason, linux-pci, Hanjun Guo, jiang.liu

From: Bjorn Helgaas <bhelgaas@google.com>

PCIe hotplug bridges are always either Root Ports or Downstream Ports.  No
other device type can have a PCIe link leading downstream to a slot.

Root Ports don't have an upstream bridge, so "dev->is_hotplug_bridge &&
dev->bus->self" is true if and only if "dev" is a Downstream Port.  That
means we can simplify this by looking at the type of "dev" itself, without
looking upstream at all.

No functional change.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/probe.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 627cc88..87be31b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1507,8 +1507,7 @@ static int pcie_find_smpss(struct pci_dev *dev, void *data)
 	 * will occur as normal.
 	 */
 	if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
-	     (dev->bus->self &&
-	      pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)))
+	    pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT))
 		*smpss = 0;
 
 	if (*smpss > dev->pcie_mpss)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v8 5/6] PCI: Don't restrict MPS for slots below Root Ports
  2013-08-22  3:24 [PATCH v8 0/6] Update device MPS Yijing Wang
                   ` (3 preceding siblings ...)
  2013-08-22  3:24 ` [PATCH v8 4/6] PCI: Simplify MPS test for Downstream Port Yijing Wang
@ 2013-08-22  3:24 ` Yijing Wang
  2013-08-22  3:24 ` [PATCH v8 6/6] PCI: update device mps when doing pci hotplug Yijing Wang
  5 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2013-08-22  3:24 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jon Mason, linux-pci, Hanjun Guo, jiang.liu, Yijing Wang

When booting with "pci=pcie_bus_safe", we previously limited the
fabric MPS to 128 when we found:

  (1) A hotplug-capable Downstream Port ("dev->is_hotplug_bridge &&
      pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT"), or

  (2) A hotplug-capable Root Port with a slot that was either empty or
      contained a multi-function device ("dev->is_hotplug_bridge &&
      !list_is_singular(&dev->bus->devices)")

Part (1) is valid, but part (2) is not.

After a hot-add in the slot below a Root Port, we can reconfigure all
MPS values in the fabric below the Root Port because the new device is
the only thing below the Root Port and there are no active drivers.
Therefore, there's no reason to limit the MPS for Root Ports, no
matter what's in the slot.

Test info:

    -+-[0000:40]-+-07.0-[0000:46]--+-00.0  Intel 82576 NIC
                                   \-00.1  Intel 82576 NIC

    0000:40:07.0 Root Port bridge to [bus 46] (MPS supported=256)
    0000:46:00.0 Endpoint                     (MPS supported=512)
    0000:46:00.1 Endpoint                     (MPS supported=512)

    # echo 0 > /sys/bus/pci/slots/7/power
    # echo 1 > /sys/bus/pci/slots/7/power
    # dmesg
    ...
    pcieport 0000:40:07.0: PCI-E Max Payload Size set to 256/ 256 (was 256)
    pci 0000:46:00.0:      PCI-E Max Payload Size set to 256/ 512 (was 128)
    pci 0000:46:00.1:      PCI-E Max Payload Size set to 256/ 512 (was 128)

Before this change, we set MPS to 128 for the Root Port and both NICs
because the slot contained a multi-function device and

    dev->is_hotplug_bridge && !list_is_singular(&dev->bus->devices)

was true.  After this change, we set it to 256.

[bhelgaas: changelog, comments, split out upstream bridge check]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jon Mason <jdmason@kudzu.us>
---
 drivers/pci/probe.c |   32 ++++++++++++++++----------------
 1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 87be31b..4afd158 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1491,23 +1491,23 @@ static int pcie_find_smpss(struct pci_dev *dev, void *data)
 	if (!pci_is_pcie(dev))
 		return 0;
 
-	/* For PCIE hotplug enabled slots not connected directly to a
-	 * PCI-E root port, there can be problems when hotplugging
-	 * devices.  This is due to the possibility of hotplugging a
-	 * device into the fabric with a smaller MPS that the devices
-	 * currently running have configured.  Modifying the MPS on the
-	 * running devices could cause a fatal bus error due to an
-	 * incoming frame being larger than the newly configured MPS.
-	 * To work around this, the MPS for the entire fabric must be
-	 * set to the minimum size.  Any devices hotplugged into this
-	 * fabric will have the minimum MPS set.  If the PCI hotplug
-	 * slot is directly connected to the root port and there are not
-	 * other devices on the fabric (which seems to be the most
-	 * common case), then this is not an issue and MPS discovery
-	 * will occur as normal.
+	/*
+	 * We don't have a way to change MPS settings on devices that have
+	 * drivers attached.  A hot-added device might support only the minimum
+	 * MPS setting (MPS=128).  Therefore, if the fabric contains a bridge
+	 * where devices may be hot-added, we limit the fabric MPS to 128 so
+	 * hot-added devices will work correctly.
+	 *
+	 * However, if we hot-add a device to a slot directly below a Root
+	 * Port, it's impossible for there to be other existing devices below
+	 * the port.  We don't limit the MPS in this case because we can
+	 * reconfigure MPS on both the Root Port and the hot-added device,
+	 * and there are no other devices involved.
+	 *
+	 * Note that this PCIE_BUS_SAFE path assumes no peer-to-peer DMA.
 	 */
-	if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
-	    pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT))
+	if (dev->is_hotplug_bridge &&
+	    pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
 		*smpss = 0;
 
 	if (*smpss > dev->pcie_mpss)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-22  3:24 [PATCH v8 0/6] Update device MPS Yijing Wang
                   ` (4 preceding siblings ...)
  2013-08-22  3:24 ` [PATCH v8 5/6] PCI: Don't restrict MPS for slots below Root Ports Yijing Wang
@ 2013-08-22  3:24 ` Yijing Wang
  2013-08-22 18:18   ` Bjorn Helgaas
  5 siblings, 1 reply; 17+ messages in thread
From: Yijing Wang @ 2013-08-22  3:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jon Mason, linux-pci, Hanjun Guo, jiang.liu, Yijing Wang, stable

Currently we don't update device's mps value when doing
pci device hot-add. The hot-added device's mps will be set
to default value (128B). But the upstream port device's mps
may be larger than 128B which was set by firmware during
system bootup. In this case the new added device may not
work normally. This patch try to update the hot added device
mps equal to its parent mps, if device mpss < parent mps,
print warning.

References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
Reported-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: stable@vger.kernel.org # 3.4+
---
 drivers/pci/probe.c |   48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 47 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 4afd158..06e88c5 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1602,6 +1602,43 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
 	return 0;
 }
 
+static int pcie_bus_update_set(struct pci_dev *dev, void *data)
+{
+	int mps, p_mps, mpss;
+	struct pci_dev *parent;
+
+	if (!pci_is_pcie(dev) || !dev->bus->self)
+		return 0;
+
+	parent = dev->bus->self;
+	mps = pcie_get_mps(dev);
+	p_mps = pcie_get_mps(dev->bus->self);
+
+	if (mps >= p_mps)
+		return 0;
+
+	/* we only update the device mps, unless its parent device is root port,
+	 * and it is the only slot directly connected to root port.
+	 */
+	mpss = 128 << dev->pcie_mpss;
+	if (mpss >= p_mps) {
+		pcie_write_mps(dev, p_mps);
+	} else if (pci_pcie_type(parent) == PCI_EXP_TYPE_ROOT_PORT) {
+		pcie_write_mps(parent, mpss);
+		pcie_write_mps(dev, mpss);
+	} else
+		dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
+				"If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
+				mps, 128 << dev->pcie_mpss, p_mps);
+	return 0;
+}
+
+static void pcie_bus_update_setting(struct pci_bus *bus)
+{
+	if (bus->self->is_hotplug_bridge)
+		pci_walk_bus(bus, pcie_bus_update_set, NULL);
+}
+
 /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
  * parents then children fashion.  If this changes, then this code will not
  * work as designed.
@@ -1616,8 +1653,17 @@ void pcie_bus_configure_settings(struct pci_bus *bus)
 	if (!pci_is_pcie(bus->self))
 		return;
 
-	if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
+	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+		/* Sometimes we should update device mps here,
+		 * eg. after hot add, device mps value will be
+		 * set to default(128B), but the upstream port
+		 * mps value may be larger than 128B, if we do
+		 * not update the device mps, it maybe can not
+		 * work normally.
+		 */
+		pcie_bus_update_setting(bus);
 		return;
+	}
 
 	/* FIXME - Peer to peer DMA is possible, though the endpoint would need
 	 * to be aware to the MPS of the destination.  To work around this,
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-22  3:24 ` [PATCH v8 6/6] PCI: update device mps when doing pci hotplug Yijing Wang
@ 2013-08-22 18:18   ` Bjorn Helgaas
  2013-08-26  3:42     ` Yijing Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2013-08-22 18:18 UTC (permalink / raw)
  To: Yijing Wang; +Cc: Jon Mason, linux-pci, Hanjun Guo, jiang.liu, joe.jin

[+cc Joe]

On Thu, Aug 22, 2013 at 11:24:48AM +0800, Yijing Wang wrote:
> Currently we don't update device's mps value when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally. This patch try to update the hot added device
> mps equal to its parent mps, if device mpss < parent mps,
> print warning.
> 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> Cc: stable@vger.kernel.org # 3.4+
> ---
>  drivers/pci/probe.c |   48 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 47 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 4afd158..06e88c5 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1602,6 +1602,43 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>  	return 0;
>  }
>  
> +static int pcie_bus_update_set(struct pci_dev *dev, void *data)
> +{
> +	int mps, p_mps, mpss;
> +	struct pci_dev *parent;
> +
> +	if (!pci_is_pcie(dev) || !dev->bus->self)
> +		return 0;
> +
> +	parent = dev->bus->self;
> +	mps = pcie_get_mps(dev);
> +	p_mps = pcie_get_mps(dev->bus->self);
> +
> +	if (mps >= p_mps)
> +		return 0;
> +
> +	/* we only update the device mps, unless its parent device is root port,
> +	 * and it is the only slot directly connected to root port.
> +	 */
> +	mpss = 128 << dev->pcie_mpss;
> +	if (mpss >= p_mps) {
> +		pcie_write_mps(dev, p_mps);
> +	} else if (pci_pcie_type(parent) == PCI_EXP_TYPE_ROOT_PORT) {
> +		pcie_write_mps(parent, mpss);
> +		pcie_write_mps(dev, mpss);
> +	} else
> +		dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
> +				"If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
> +				mps, 128 << dev->pcie_mpss, p_mps);
> +	return 0;
> +}
> +
> +static void pcie_bus_update_setting(struct pci_bus *bus)
> +{
> +	if (bus->self->is_hotplug_bridge)
> +		pci_walk_bus(bus, pcie_bus_update_set, NULL);
> +}
> +
>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>   * parents then children fashion.  If this changes, then this code will not
>   * work as designed.
> @@ -1616,8 +1653,17 @@ void pcie_bus_configure_settings(struct pci_bus *bus)
>  	if (!pci_is_pcie(bus->self))
>  		return;
>  
> -	if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
> +	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
> +		/* Sometimes we should update device mps here,
> +		 * eg. after hot add, device mps value will be
> +		 * set to default(128B), but the upstream port
> +		 * mps value may be larger than 128B, if we do
> +		 * not update the device mps, it maybe can not
> +		 * work normally.
> +		 */
> +		pcie_bus_update_setting(bus);

I think the strategy of updating the device MPS when possible makes
sense, but I don't think we should do it in PCIE_BUS_TUNE_OFF mode.
That mode is documented as "Disable PCIe MPS tuning and use the
BIOS-configured MPS defaults."  This patch changes that to something
like "Disable PCIe MPS tuning, except for hot-added devices" and there
is no longer a way to tell Linux to never touch MPS.

Eventually, I think the default mode should change to PCIE_BUS_SAFE,
where Linux changes MPS settings at boot-time and at hotplug-time to
make sure every device works.  (This mode assumes no peer-to-peer
DMA.)  I know this was tried in the past, and we tripped over all
sorts of issues, but it's not clear how many were problems with the
Linux code and how many were unsolvable BIOS or platform issues.

Then we'd have these choices:

  PCIE_BUS_TUNE_OFF	Never touch MPS
  PCIE_BUS_PEER2PEER	Set all MPS to 128, so peer-to-peer DMA works
  PCIE_BUS_SAFE		Configure each device with largest safe MPS
			(assumes no peer-to-peer DMA)
  PCIE_BUS_PERFORMANCE	Use MRRS in addition to MPS
			(assumes no peer-to-peer DMA)

The hot-add issue [1] could be regarded as a BIOS bug -- the BIOS
programmed a hotplug bridge with MPS=256.  A hot-added device powers
up with MPS=128, so it's only safe for BIOS to set MPS=256 if the OS
is smart enough to change the bridge MPS, the device MPS, or both, at
hot-add time.  That doesn't seem like a good assumption for a BIOS to
make.

I think we should always *warn* about potential MPS issues, even in
PCIE_BUS_TUNE_OFF mode.  That would help diagnose the hot-add issue as
well as issues like the ones Joe Jin reported [2] and [3].

I think what we should do is *always* call pcie_bus_configure_set(),
no matter what mode we're in, but make pcie_bus_configure_set() smart
enough to do different things (print warnings, adjust settings, do the
stuff you added in pcie_bus_update_set(), etc.) depending on what mode
we're in.

Bjorn

>  		return;
> +	}
>  
>  	/* FIXME - Peer to peer DMA is possible, though the endpoint would need
>  	 * to be aware to the MPS of the destination.  To work around this,
> -- 
> 1.7.1
> 
> 

[1] https://bugzilla.kernel.org/show_bug.cgi?id=60671
[2] http://lkml.kernel.org/r/4FFA9B96.6040901@oracle.com
[3] http://lkml.kernel.org/r/509B5038.8090304@oracle.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-22 18:18   ` Bjorn Helgaas
@ 2013-08-26  3:42     ` Yijing Wang
  2013-08-26 21:33       ` Bjorn Helgaas
  0 siblings, 1 reply; 17+ messages in thread
From: Yijing Wang @ 2013-08-26  3:42 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jon Mason, linux-pci, Hanjun Guo, jiang.liu, joe.jin

> I think the strategy of updating the device MPS when possible makes
> sense, but I don't think we should do it in PCIE_BUS_TUNE_OFF mode.
> That mode is documented as "Disable PCIe MPS tuning and use the
> BIOS-configured MPS defaults."  This patch changes that to something
> like "Disable PCIe MPS tuning, except for hot-added devices" and there
> is no longer a way to tell Linux to never touch MPS.

Hi Bjorn,
   Thanks for your review and comments!

As you mentioned, PCIE_BUS_TUNE_OFF means "Disable PCIe MPS tuning and use the
BIOS-configured MPS defaults.", But hotplug action make the BIOS default mps setting
changed(power off, all registers reset). So If we only touch the newly inserted device mps,
I think maybe it's reasonable.

> 
> Eventually, I think the default mode should change to PCIE_BUS_SAFE,
> where Linux changes MPS settings at boot-time and at hotplug-time to
> make sure every device works.  (This mode assumes no peer-to-peer
> DMA.)  I know this was tried in the past, and we tripped over all
> sorts of issues, but it's not clear how many were problems with the
> Linux code and how many were unsolvable BIOS or platform issues.

Agree.

> 
> Then we'd have these choices:
> 
>   PCIE_BUS_TUNE_OFF	Never touch MPS
>   PCIE_BUS_PEER2PEER	Set all MPS to 128, so peer-to-peer DMA works
>   PCIE_BUS_SAFE		Configure each device with largest safe MPS
> 			(assumes no peer-to-peer DMA)
>   PCIE_BUS_PERFORMANCE	Use MRRS in addition to MPS
> 			(assumes no peer-to-peer DMA)
> 
> The hot-add issue [1] could be regarded as a BIOS bug -- the BIOS
> programmed a hotplug bridge with MPS=256.  A hot-added device powers
> up with MPS=128, so it's only safe for BIOS to set MPS=256 if the OS
> is smart enough to change the bridge MPS, the device MPS, or both, at
> hot-add time.  That doesn't seem like a good assumption for a BIOS to
> make.
> 
> I think we should always *warn* about potential MPS issues, even in
> PCIE_BUS_TUNE_OFF mode.  That would help diagnose the hot-add issue as
> well as issues like the ones Joe Jin reported [2] and [3].

OK, I will add a new patch to provide "warn" info if necessary like Joe Jin reported.
But because hotplug issue [1] and Joe reported [2] and [3] only encountered in
PCIE_BUS_TUNE_OFF mode.

> 
> I think what we should do is *always* call pcie_bus_configure_set(),
> no matter what mode we're in, but make pcie_bus_configure_set() smart
> enough to do different things (print warnings, adjust settings, do the
> stuff you added in pcie_bus_update_set(), etc.) depending on what mode
> we're in.


OK, I will try to rework this patch.


Thanks!
Yijing.


>> +	}
>>  
>>  	/* FIXME - Peer to peer DMA is possible, though the endpoint would need
>>  	 * to be aware to the MPS of the destination.  To work around this,
>> -- 
>> 1.7.1
>>
>>
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=60671
> [2] http://lkml.kernel.org/r/4FFA9B96.6040901@oracle.com
> [3] http://lkml.kernel.org/r/509B5038.8090304@oracle.com
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-26  3:42     ` Yijing Wang
@ 2013-08-26 21:33       ` Bjorn Helgaas
  2013-08-27  0:39         ` Yinghai Lu
  2013-08-27  1:49         ` Yijing Wang
  0 siblings, 2 replies; 17+ messages in thread
From: Bjorn Helgaas @ 2013-08-26 21:33 UTC (permalink / raw)
  To: Yijing Wang; +Cc: Jon Mason, linux-pci, Hanjun Guo, Jiang Liu, Jin Feng

On Sun, Aug 25, 2013 at 9:42 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> I think the strategy of updating the device MPS when possible makes
>> sense, but I don't think we should do it in PCIE_BUS_TUNE_OFF mode.
>> That mode is documented as "Disable PCIe MPS tuning and use the
>> BIOS-configured MPS defaults."  This patch changes that to something
>> like "Disable PCIe MPS tuning, except for hot-added devices" and there
>> is no longer a way to tell Linux to never touch MPS.
>
> Hi Bjorn,
>    Thanks for your review and comments!
>
> As you mentioned, PCIE_BUS_TUNE_OFF means "Disable PCIe MPS tuning and use the
> BIOS-configured MPS defaults.", But hotplug action make the BIOS default mps setting
> changed(power off, all registers reset). So If we only touch the newly inserted device mps,
> I think maybe it's reasonable.

I agree, it might be reasonable.  But I think it's too hard to
document that behavior.  I think it's better to have behavior that is
easy to understand and explain, even if it is slightly suboptimal.

The current Linux default is PCIE_BUS_TUNE_OFF, and given that I don't
want to touch any MPS settings in that mode, I don't see a way to
safely fix https://bugzilla.kernel.org/show_bug.cgi?id=60671 (the
problem with hot-added devices not working because MPS is incorrect).
In the long term, I hope we can fix it by making the default
PCIE_BUS_SAFE, but that doesn't help right now.

That leaves us with only the workaround of booting the Huawei rh5885
box with "pci=pcie_bus_safe".

I'm willing to accept that because I think we can argue that this is
really a BIOS defect.  The BIOS *can* program MPS to values that will
be safe for hotplug even if the OS does nothing, i.e., it can set
MPS=128 in all paths that lead to a hotpluggable slot.  I think that's
probably what this BIOS *should* do, since it has no way of knowing
whether the OS will support hotplug or whether the OS will reprogram
any MPS values.

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-26 21:33       ` Bjorn Helgaas
@ 2013-08-27  0:39         ` Yinghai Lu
  2013-08-27  1:49         ` Yijing Wang
  1 sibling, 0 replies; 17+ messages in thread
From: Yinghai Lu @ 2013-08-27  0:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yijing Wang, Jon Mason, linux-pci, Hanjun Guo, Jiang Liu, Jin Feng

On Mon, Aug 26, 2013 at 2:33 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> The current Linux default is PCIE_BUS_TUNE_OFF, and given that I don't
> want to touch any MPS settings in that mode, I don't see a way to
> safely fix https://bugzilla.kernel.org/show_bug.cgi?id=60671 (the
> problem with hot-added devices not working because MPS is incorrect).
> In the long term, I hope we can fix it by making the default
> PCIE_BUS_SAFE, but that doesn't help right now.
>
> That leaves us with only the workaround of booting the Huawei rh5885
> box with "pci=pcie_bus_safe".
>
> I'm willing to accept that because I think we can argue that this is
> really a BIOS defect.  The BIOS *can* program MPS to values that will
> be safe for hotplug even if the OS does nothing, i.e., it can set
> MPS=128 in all paths that lead to a hotpluggable slot.  I think that's
> probably what this BIOS *should* do, since it has no way of knowing
> whether the OS will support hotplug or whether the OS will reprogram
> any MPS values.

BIOS should have several settings for MPS:
1. 128
2. auto or performance.

When it set to Auto, Linux will have problem with hot-add.

Default one was 128 before, that is ok,
as from sndbrige and ivbridge, chipset could support more than 128.

BIOS want to set it auto.
BIOS guys is claiming that other OSes are ok with Auto, but only Linux
has problem.

So maybe it's time for us to change default to pcie_bus_perf iff
1. we detect there are pcie bridge with hotplug support is around
2. mpss for those bridge is not set 128. --- keep this optional ?

at same time issue warning that we change to perf, is user have
problem they could try
to override from command line when they have problem.

Yinghai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-26 21:33       ` Bjorn Helgaas
  2013-08-27  0:39         ` Yinghai Lu
@ 2013-08-27  1:49         ` Yijing Wang
  1 sibling, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2013-08-27  1:49 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jon Mason, linux-pci, Hanjun Guo, Jiang Liu, Jin Feng

On 2013/8/27 5:33, Bjorn Helgaas wrote:
> On Sun, Aug 25, 2013 at 9:42 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> I think the strategy of updating the device MPS when possible makes
>>> sense, but I don't think we should do it in PCIE_BUS_TUNE_OFF mode.
>>> That mode is documented as "Disable PCIe MPS tuning and use the
>>> BIOS-configured MPS defaults."  This patch changes that to something
>>> like "Disable PCIe MPS tuning, except for hot-added devices" and there
>>> is no longer a way to tell Linux to never touch MPS.
>>
>> Hi Bjorn,
>>    Thanks for your review and comments!
>>
>> As you mentioned, PCIE_BUS_TUNE_OFF means "Disable PCIe MPS tuning and use the
>> BIOS-configured MPS defaults.", But hotplug action make the BIOS default mps setting
>> changed(power off, all registers reset). So If we only touch the newly inserted device mps,
>> I think maybe it's reasonable.
> 
> I agree, it might be reasonable.  But I think it's too hard to
> document that behavior.  I think it's better to have behavior that is
> easy to understand and explain, even if it is slightly suboptimal.
> 
> The current Linux default is PCIE_BUS_TUNE_OFF, and given that I don't
> want to touch any MPS settings in that mode, I don't see a way to
> safely fix https://bugzilla.kernel.org/show_bug.cgi?id=60671 (the
> problem with hot-added devices not working because MPS is incorrect).
> In the long term, I hope we can fix it by making the default
> PCIE_BUS_SAFE, but that doesn't help right now.

I also think we should consider to change default mode to pcie_bus_safe.
Jon mentioned that there are number of issues discovered on some x86 chipsets.
However, no further details.
But if we use PCIE_BUS_TUNE_OFF all the time, we never have chance to fix these issues.

> 
> That leaves us with only the workaround of booting the Huawei rh5885
> box with "pci=pcie_bus_safe".
> 
> I'm willing to accept that because I think we can argue that this is
> really a BIOS defect.  The BIOS *can* program MPS to values that will
> be safe for hotplug even if the OS does nothing, i.e., it can set
> MPS=128 in all paths that lead to a hotpluggable slot.  I think that's
> probably what this BIOS *should* do, since it has no way of knowing
> whether the OS will support hotplug or whether the OS will reprogram
> any MPS values.
> 

Yes, we temporarily make BIOS program all MPS to 128 to avoid this problem now.

Thanks!
Yijing.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-29 22:46     ` Yinghai Lu
@ 2013-08-30 15:41       ` Bjorn Helgaas
  0 siblings, 0 replies; 17+ messages in thread
From: Bjorn Helgaas @ 2013-08-30 15:41 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Yijing Wang, Jon Mason, linux-pci, Hanjun Guo, Jiang Liu,
	Jin Feng, linux-kernel

On Thu, Aug 29, 2013 at 4:46 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Aug 29, 2013 at 3:22 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> Note that I think Linux *should* eventually actively manage MPS, and
>> when it does, case 3 should "just work".  I just don't understand what
>> the point of the BIOS using case 3 is.
>>
>> I suppose other OSes must get better performance in this "auto" mode?
>
> Yes.

My take on this is that "auto" mode really means "Windows" mode,
because it's a BIOS workaround for shortcomings in Windows.  It's
tailored to do system-wide MPS configuration that Windows doesn't do,
while relying on Windows to do minimal reconfiguration after a
hot-plug.  I don't feel any particular urgency to make Linux work with
that.

In my opinion, a BIOS should configure the machine in the safest
possible way.  Then everything works, and if we boot an OS that is
smart enough to reconfigure it in a more optimal way, that's great,
but it's not required.  For MPS, I think that means configuring the
machine as I outlined in case 1 (MPS=128 always) or case 2 (larger MPS
allowed on non-hotplug paths if the BIOS knows the root complex splits
packets).

>> (What exactly is that mode, anyway?)  That means the other OS must be
>> smart enough to deal with hotplug device replacement, but not smart
>> enough to configure MPS all by itself starting from scratch.  I don't
>> know what rules would tell us "this MPS must be configured by the BIOS
>> and the OS should leave it alone" and "the OS must configure MPS on
>> this device for hotplug."  How can we make sense out of that?
>
> So my suggestion:
> We scan mps of in the bridges to find out if any is set to other than 128.
> if there is any bridge that mps is not 128 and it is hotplug slot.
> We change to PCI_BUS_TUNE_PERF for that system.

That seems too arbitrary and magic to me.  I don't really want to set
modes based on what we discover in the machine.  If we're going to do
MPS config, we should just do it right and do it everywhere.

This discussion is not really going anywhere because we don't have any
concrete changes on the table, so I'm going to try to resist
continuing this thread :)

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-29 22:22   ` Bjorn Helgaas
@ 2013-08-29 22:46     ` Yinghai Lu
  2013-08-30 15:41       ` Bjorn Helgaas
  0 siblings, 1 reply; 17+ messages in thread
From: Yinghai Lu @ 2013-08-29 22:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yijing Wang, Jon Mason, linux-pci, Hanjun Guo, Jiang Liu,
	Jin Feng, linux-kernel

On Thu, Aug 29, 2013 at 3:22 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> Note that I think Linux *should* eventually actively manage MPS, and
> when it does, case 3 should "just work".  I just don't understand what
> the point of the BIOS using case 3 is.
>
> I suppose other OSes must get better performance in this "auto" mode?

Yes.

> (What exactly is that mode, anyway?)  That means the other OS must be
> smart enough to deal with hotplug device replacement, but not smart
> enough to configure MPS all by itself starting from scratch.  I don't
> know what rules would tell us "this MPS must be configured by the BIOS
> and the OS should leave it alone" and "the OS must configure MPS on
> this device for hotplug."  How can we make sense out of that?

So my suggestion:
We scan mps of in the bridges to find out if any is set to other than 128.
if there is any bridge that mps is not 128 and it is hotplug slot.
We change to PCI_BUS_TUNE_PERF for that system.

Yinghai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-29 21:47 ` Yinghai Lu
@ 2013-08-29 22:22   ` Bjorn Helgaas
  2013-08-29 22:46     ` Yinghai Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2013-08-29 22:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Yijing Wang, Jon Mason, linux-pci, Hanjun Guo, Jiang Liu,
	Jin Feng, linux-kernel

On Thu, Aug 29, 2013 at 3:47 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Aug 29, 2013 at 2:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> [+cc linux-kernel, since more folks might be interested]
>
>> I don't know what the BIOS "auto" setting means, but it must mean
>> something in case 3, because that's the only case where OS support is
>> required.  But if the OS is smart enough to manage MPS for hot-added
>> devices, why can't the OS also program MPS for the whole system at
>> boot-time?
>>
>> That's why I don't understand what BIOS wants to do.  It sounds like
>> they want the performance benefit of larger MPS for devices present in
>> hot-plug slots at boot-time, even if the OS doesn't actively manage
>> MPS and things blow up if that device is replaced with one that
>> supports a smaller MPS.  That choice doesn't make sense.
>>
>> In case 3, with a non-MPS-aware OS, you get better performance for a
>> while, but blow up if a card is replaced.  And with an MPS-aware OS,
>> there should be no advantage to case 3: the OS should be able to get
>> good performance by programming MPS itself, even without help from the
>> BIOS.
>
> With OS default setting on case 3, other two OS are ok with hotplug,
> but Linux does not.

I'm not disputing that.  I said plainly that in case 3, things blow up
if the OS doesn't actively manage MPS.  By default Linux doesn't touch
MPS, so it blows up if a card is replaced.

I am suggesting that it doesn't make any sense for a BIOS to use case
3.  Please make an argument for why it *does* make sense to use case
3.

Note that I think Linux *should* eventually actively manage MPS, and
when it does, case 3 should "just work".  I just don't understand what
the point of the BIOS using case 3 is.

I suppose other OSes must get better performance in this "auto" mode?
(What exactly is that mode, anyway?)  That means the other OS must be
smart enough to deal with hotplug device replacement, but not smart
enough to configure MPS all by itself starting from scratch.  I don't
know what rules would tell us "this MPS must be configured by the BIOS
and the OS should leave it alone" and "the OS must configure MPS on
this device for hotplug."  How can we make sense out of that?

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
  2013-08-29 21:09 Bjorn Helgaas
@ 2013-08-29 21:47 ` Yinghai Lu
  2013-08-29 22:22   ` Bjorn Helgaas
  0 siblings, 1 reply; 17+ messages in thread
From: Yinghai Lu @ 2013-08-29 21:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Yijing Wang, Jon Mason, linux-pci, Hanjun Guo, Jiang Liu,
	Jin Feng, linux-kernel

On Thu, Aug 29, 2013 at 2:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc linux-kernel, since more folks might be interested]

> I don't know what the BIOS "auto" setting means, but it must mean
> something in case 3, because that's the only case where OS support is
> required.  But if the OS is smart enough to manage MPS for hot-added
> devices, why can't the OS also program MPS for the whole system at
> boot-time?
>
> That's why I don't understand what BIOS wants to do.  It sounds like
> they want the performance benefit of larger MPS for devices present in
> hot-plug slots at boot-time, even if the OS doesn't actively manage
> MPS and things blow up if that device is replaced with one that
> supports a smaller MPS.  That choice doesn't make sense.
>
> In case 3, with a non-MPS-aware OS, you get better performance for a
> while, but blow up if a card is replaced.  And with an MPS-aware OS,
> there should be no advantage to case 3: the OS should be able to get
> good performance by programming MPS itself, even without help from the
> BIOS.

With OS default setting on case 3, other two OS are ok with hotplug,
but Linux does not.

Yinghai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] PCI: update device mps when doing pci hotplug
@ 2013-08-29 21:09 Bjorn Helgaas
  2013-08-29 21:47 ` Yinghai Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2013-08-29 21:09 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Yijing Wang, Jon Mason, linux-pci, Hanjun Guo, Jiang Liu,
	Jin Feng, linux-kernel

[+cc linux-kernel, since more folks might be interested]

On Mon, Aug 26, 2013 at 6:39 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Aug 26, 2013 at 2:33 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> The current Linux default is PCIE_BUS_TUNE_OFF, and given that I don't
>> want to touch any MPS settings in that mode, I don't see a way to
>> safely fix https://bugzilla.kernel.org/show_bug.cgi?id=60671 (the
>> problem with hot-added devices not working because MPS is incorrect).
>> In the long term, I hope we can fix it by making the default
>> PCIE_BUS_SAFE, but that doesn't help right now.
>>
>> That leaves us with only the workaround of booting the Huawei rh5885
>> box with "pci=pcie_bus_safe".
>>
>> I'm willing to accept that because I think we can argue that this is
>> really a BIOS defect.  The BIOS *can* program MPS to values that will
>> be safe for hotplug even if the OS does nothing, i.e., it can set
>> MPS=128 in all paths that lead to a hotpluggable slot.  I think that's
>> probably what this BIOS *should* do, since it has no way of knowing
>> whether the OS will support hotplug or whether the OS will reprogram
>> any MPS values.
>
> BIOS should have several settings for MPS:
> 1. 128
> 2. auto or performance.
>
> When it set to Auto, Linux will have problem with hot-add.
>
> Default one was 128 before, that is ok,
> as from sndbrige and ivbridge, chipset could support more than 128.
>
> BIOS want to set it auto.
> BIOS guys is claiming that other OSes are ok with Auto, but only Linux
> has problem.

I don't understand the argument the BIOS guys are making.

1.  If the BIOS sets MPS=128 for all devices all the time, everything
should always work.  Performance won't be optimal, but it should
always work.

This requires no OS support at all, so the current Linux default of
doing nothing is fine.

2.  If the BIOS sets MPS to something larger than 128 for hardwired
devices only (where no hotplug is possible), the BIOS either knows
that the root complex will split peer-to-peer packets as required (sec
1.3.1), or it assumes there will be no peer-to-peer traffic between
hierarchies.

This should also work fine with no OS support, unless we do
peer-to-peer traffic and the root complex doesn't split packets.

3.  If the BIOS sets MPS to something larger than 128 in a path that
leads to a hotplug slot, the BIOS assumes the OS actively manages MPS
for hotplug.

This requires OS support because if we hot-add a device that only
supports MPS=128, the OS must reprogram the upstream path before using
the device.

I don't know what the BIOS "auto" setting means, but it must mean
something in case 3, because that's the only case where OS support is
required.  But if the OS is smart enough to manage MPS for hot-added
devices, why can't the OS also program MPS for the whole system at
boot-time?

That's why I don't understand what BIOS wants to do.  It sounds like
they want the performance benefit of larger MPS for devices present in
hot-plug slots at boot-time, even if the OS doesn't actively manage
MPS and things blow up if that device is replaced with one that
supports a smaller MPS.  That choice doesn't make sense.

In case 3, with a non-MPS-aware OS, you get better performance for a
while, but blow up if a card is replaced.  And with an MPS-aware OS,
there should be no advantage to case 3: the OS should be able to get
good performance by programming MPS itself, even without help from the
BIOS.

Bjorn

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-08-30 15:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-22  3:24 [PATCH v8 0/6] Update device MPS Yijing Wang
2013-08-22  3:24 ` [PATCH v8 1/6] PCI: Drop "PCI-E" prefix from Max Payload Size message Yijing Wang
2013-08-22  3:24 ` [PATCH v8 2/6] PCI: Simplify pcie_bus_configure_settings() interface Yijing Wang
2013-08-22  3:24 ` [PATCH v8 3/6] PCI: Remove unnecessary check for pcie_get_mps() failure Yijing Wang
2013-08-22  3:24 ` [PATCH v8 4/6] PCI: Simplify MPS test for Downstream Port Yijing Wang
2013-08-22  3:24 ` [PATCH v8 5/6] PCI: Don't restrict MPS for slots below Root Ports Yijing Wang
2013-08-22  3:24 ` [PATCH v8 6/6] PCI: update device mps when doing pci hotplug Yijing Wang
2013-08-22 18:18   ` Bjorn Helgaas
2013-08-26  3:42     ` Yijing Wang
2013-08-26 21:33       ` Bjorn Helgaas
2013-08-27  0:39         ` Yinghai Lu
2013-08-27  1:49         ` Yijing Wang
2013-08-29 21:09 Bjorn Helgaas
2013-08-29 21:47 ` Yinghai Lu
2013-08-29 22:22   ` Bjorn Helgaas
2013-08-29 22:46     ` Yinghai Lu
2013-08-30 15:41       ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.