linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] PCI: update device mps when doing pci hotplug
@ 2014-07-29  8:17 Yijing Wang
  2014-07-29 16:18 ` Alex Williamson
                   ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: Yijing Wang @ 2014-07-29  8:17 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Jordan_Hargrave, keith.busch, jon.mason, Yijing Wang,
	Jon Mason

Currently we don't update device's mps value when doing
pci device hot-add. The hot-added device's mps will be set
to default value (128B). But the upstream port device's mps
may be larger than 128B which was set by firmware during
system bootup. In this case the new added device may not
work normally. This issue was found in huawei 5885 server
and Dell R620 server. And if we run the platform with windows,
this problem is gone. This patch try to update the hot added
device mps equal to its parent mps, if device mpss < parent mps,
print warning.

References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
Reported-by: Keith Busch <keith.busch@intel.com>
Reported-by: Jordan_Hargrave@Dell.com
Reported-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
---
 drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e3cf8a2..583ca52 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
 		dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
 }
 
+/**
+ * pcie_bus_update_set - update device mps when device doing hot-add
+ * @dev: PCI device to set
+ * 
+ * After device hot add, mps will be set to default(128B), But the 
+ * upstream port device's mps may be larger than 128B which was set 
+ * by firmware during system bootup. Then we should update the device
+ * mps to equal to its parent mps, Or the device can not work normally.
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+	int mps, p_mps, mpss;
+	struct pci_dev *parent;
+
+	if (!pci_is_pcie(dev) || !dev->bus->self 
+			|| !dev->bus->self->is_hotplug_bridge)
+		return;
+	
+	parent = dev->bus->self;
+	mps = pcie_get_mps(dev);
+	p_mps = pcie_get_mps(parent);
+
+	if (mps >= p_mps)
+		return;
+
+	mpss = 128 << dev->pcie_mpss;
+	if (mpss < p_mps) {
+		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
+				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
+				mpss, p_mps);
+		return;
+	}
+
+	pcie_write_mps(dev, p_mps);
+	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
+			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
+}
+
 static void pcie_bus_detect_mps(struct pci_dev *dev)
 {
 	struct pci_dev *bridge = dev->bus->self;
@@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
 		return 0;
 
 	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+		pcie_bus_update_set(dev);
 		pcie_bus_detect_mps(dev);
 		return 0;
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-29  8:17 [PATCH] PCI: update device mps when doing pci hotplug Yijing Wang
@ 2014-07-29 16:18 ` Alex Williamson
  2014-07-29 16:30   ` Keith Busch
  2014-07-30  3:27   ` Yijing Wang
  2014-07-30  3:33 ` Ethan Zhao
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 42+ messages in thread
From: Alex Williamson @ 2014-07-29 16:18 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci, Jordan_Hargrave, keith.busch,
	jon.mason, Jon Mason

On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
> Currently we don't update device's mps value when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally.

Apologies if we rehash some previously discussed topics while I try to
cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
The device should be functional with a lower mps setting, right?

>  This issue was found in huawei 5885 server
> and Dell R620 server. And if we run the platform with windows,
> this problem is gone. This patch try to update the hot added
> device mps equal to its parent mps, if device mpss < parent mps,
> print warning.
> 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> Reported-by: Keith Busch <keith.busch@intel.com>
> Reported-by: Jordan_Hargrave@Dell.com
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> ---
>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e3cf8a2..583ca52 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>  		dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>  }
>  
> +/**
> + * pcie_bus_update_set - update device mps when device doing hot-add
> + * @dev: PCI device to set
> + * 
> + * After device hot add, mps will be set to default(128B), But the 
> + * upstream port device's mps may be larger than 128B which was set 
> + * by firmware during system bootup. Then we should update the device
> + * mps to equal to its parent mps, Or the device can not work normally.
> + */
> +static void pcie_bus_update_set(struct pci_dev *dev)
> +{
> +	int mps, p_mps, mpss;
> +	struct pci_dev *parent;
> +
> +	if (!pci_is_pcie(dev) || !dev->bus->self 
> +			|| !dev->bus->self->is_hotplug_bridge)
> +		return;
> +	
> +	parent = dev->bus->self;
> +	mps = pcie_get_mps(dev);
> +	p_mps = pcie_get_mps(parent);
> +
> +	if (mps >= p_mps)
> +		return;
> +
> +	mpss = 128 << dev->pcie_mpss;
> +	if (mpss < p_mps) {
> +		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
> +				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
> +				mpss, p_mps);
> +		return;
> +	}
> +
> +	pcie_write_mps(dev, p_mps);
> +	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
> +			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
> +}

So if the device mps is less than the parent mps and the device supports
the parent mps, we update the device.  If the device cannot support the
parent mps, warn.  Why do we bypass the opportunity to reduce the device
mps if it exceeds the parent mps?

> +
>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>  {
>  	struct pci_dev *bridge = dev->bus->self;
> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>  		return 0;
>  
>  	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
> +		pcie_bus_update_set(dev);
>  		pcie_bus_detect_mps(dev);
>  		return 0;
>  	}

pcie_bus_update_set() and pcie_bus_detect_mps() have a lot of
redundancy, can't we merge this new functionality into the existing
function?  Also, we're in the PCIE_BUS_TUNE_OFF branch, but we seem to
be adding code which would imply PCIE_BUS_PERFORMANCE since we're
bringing the device up to an optimal mps to match the parent.  Is there
a simpler solution to simply downgrade the dev_warn in
pcie_bus_detect_mps() to dev_info and change the text?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-29 16:18 ` Alex Williamson
@ 2014-07-29 16:30   ` Keith Busch
  2014-07-29 16:42     ` Alex Williamson
  2014-07-30  3:27   ` Yijing Wang
  1 sibling, 1 reply; 42+ messages in thread
From: Keith Busch @ 2014-07-29 16:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yijing Wang, Bjorn Helgaas, linux-pci, Jordan_Hargrave,
	keith.busch, jon.mason, Jon Mason

On Tue, 29 Jul 2014, Alex Williamson wrote:
> On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
>> Currently we don't update device's mps value when doing
>> pci device hot-add. The hot-added device's mps will be set
>> to default value (128B). But the upstream port device's mps
>> may be larger than 128B which was set by firmware during
>> system bootup. In this case the new added device may not
>> work normally.
>
> Apologies if we rehash some previously discussed topics while I try to
> cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
> The device should be functional with a lower mps setting, right?

You'd think so, but some platforms don't work. A pci-e trace showed
TLPs exceeding MPS when parent device at 256B and the end device left
at 128B. Even if that's a platform bug, I think we still want it to work.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-29 16:30   ` Keith Busch
@ 2014-07-29 16:42     ` Alex Williamson
  2014-07-29 19:04       ` Keith Busch
  2014-07-30  3:35       ` Yijing Wang
  0 siblings, 2 replies; 42+ messages in thread
From: Alex Williamson @ 2014-07-29 16:42 UTC (permalink / raw)
  To: Keith Busch
  Cc: Yijing Wang, Bjorn Helgaas, linux-pci, Jordan_Hargrave,
	jon.mason, Jon Mason

On Tue, 2014-07-29 at 10:30 -0600, Keith Busch wrote:
> On Tue, 29 Jul 2014, Alex Williamson wrote:
> > On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
> >> Currently we don't update device's mps value when doing
> >> pci device hot-add. The hot-added device's mps will be set
> >> to default value (128B). But the upstream port device's mps
> >> may be larger than 128B which was set by firmware during
> >> system bootup. In this case the new added device may not
> >> work normally.
> >
> > Apologies if we rehash some previously discussed topics while I try to
> > cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
> > The device should be functional with a lower mps setting, right?
> 
> You'd think so, but some platforms don't work. A pci-e trace showed
> TLPs exceeding MPS when parent device at 256B and the end device left
> at 128B. Even if that's a platform bug, I think we still want it to work.

But if it's a platform bug for a non-compliant device, should it be
handled as a quirk rather than standard configuration?  Thanks,

Alex



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-29 16:42     ` Alex Williamson
@ 2014-07-29 19:04       ` Keith Busch
  2014-07-30  3:35       ` Yijing Wang
  1 sibling, 0 replies; 42+ messages in thread
From: Keith Busch @ 2014-07-29 19:04 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Keith Busch, Yijing Wang, Bjorn Helgaas, linux-pci,
	Jordan_Hargrave, Jon Mason

On Tue, 29 Jul 2014, Alex Williamson wrote:
> On Tue, 2014-07-29 at 10:30 -0600, Keith Busch wrote:
>> On Tue, 29 Jul 2014, Alex Williamson wrote:
>>> On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
>>>> Currently we don't update device's mps value when doing
>>>> pci device hot-add. The hot-added device's mps will be set
>>>> to default value (128B). But the upstream port device's mps
>>>> may be larger than 128B which was set by firmware during
>>>> system bootup. In this case the new added device may not
>>>> work normally.
>>>
>>> Apologies if we rehash some previously discussed topics while I try to
>>> cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
>>> The device should be functional with a lower mps setting, right?
>>
>> You'd think so, but some platforms don't work. A pci-e trace showed
>> TLPs exceeding MPS when parent device at 256B and the end device left
>> at 128B. Even if that's a platform bug, I think we still want it to work.
>
> But if it's a platform bug for a non-compliant device, should it be
> handled as a quirk rather than standard configuration?  Thanks,

I'm not even sure it is a platform bug, but that was just a guess. The
way the devices are configured, it appears they behave inline with the
spec (from table 7-13):

   "
   Max_Payload_size -- This field sets maximum TLP payload size for the
   Function. As a Receiver, the Function must handle TLPs as large as the set
   value. As a Transmitter, the Function must not generate TLPs exceeding the
   set value.
   "

It sounds like it allows a transmitter to generate a TLP that the receiver
can't handle, but I don't know if that was the intent. :)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-29 16:18 ` Alex Williamson
  2014-07-29 16:30   ` Keith Busch
@ 2014-07-30  3:27   ` Yijing Wang
  1 sibling, 0 replies; 42+ messages in thread
From: Yijing Wang @ 2014-07-30  3:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Bjorn Helgaas, linux-pci, Jordan_Hargrave, keith.busch,
	jon.mason, Jon Mason

On 2014/7/30 0:18, Alex Williamson wrote:
> On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
>> Currently we don't update device's mps value when doing
>> pci device hot-add. The hot-added device's mps will be set
>> to default value (128B). But the upstream port device's mps
>> may be larger than 128B which was set by firmware during
>> system bootup. In this case the new added device may not
>> work normally.
> 
> Apologies if we rehash some previously discussed topics while I try to
> cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
> The device should be functional with a lower mps setting, right?

No, the device can not work, because some pcie tlp packets will be discarded.
Sorry for my poor English.
> 
>>  This issue was found in huawei 5885 server
>> and Dell R620 server. And if we run the platform with windows,
>> this problem is gone. This patch try to update the hot added
>> device mps equal to its parent mps, if device mpss < parent mps,
>> print warning.
>>
>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>> Reported-by: Keith Busch <keith.busch@intel.com>
>> Reported-by: Jordan_Hargrave@Dell.com
>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Cc: Jon Mason <jdmason@kudzu.us>
>> ---
>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index e3cf8a2..583ca52 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>  		dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>  }
>>  
>> +/**
>> + * pcie_bus_update_set - update device mps when device doing hot-add
>> + * @dev: PCI device to set
>> + * 
>> + * After device hot add, mps will be set to default(128B), But the 
>> + * upstream port device's mps may be larger than 128B which was set 
>> + * by firmware during system bootup. Then we should update the device
>> + * mps to equal to its parent mps, Or the device can not work normally.
>> + */
>> +static void pcie_bus_update_set(struct pci_dev *dev)
>> +{
>> +	int mps, p_mps, mpss;
>> +	struct pci_dev *parent;
>> +
>> +	if (!pci_is_pcie(dev) || !dev->bus->self 
>> +			|| !dev->bus->self->is_hotplug_bridge)
>> +		return;
>> +	
>> +	parent = dev->bus->self;
>> +	mps = pcie_get_mps(dev);
>> +	p_mps = pcie_get_mps(parent);
>> +
>> +	if (mps >= p_mps)
>> +		return;
>> +
>> +	mpss = 128 << dev->pcie_mpss;
>> +	if (mpss < p_mps) {
>> +		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>> +				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>> +				mpss, p_mps);
>> +		return;
>> +	}
>> +
>> +	pcie_write_mps(dev, p_mps);
>> +	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
>> +			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>> +}
> 
> So if the device mps is less than the parent mps and the device supports
> the parent mps, we update the device.  If the device cannot support the
> parent mps, warn.  Why do we bypass the opportunity to reduce the device
> mps if it exceeds the parent mps?

Exactly, the device's mps will never larger than its parent's. That's unexpected, so we leave it.

> 
>> +
>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>  {
>>  	struct pci_dev *bridge = dev->bus->self;
>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>  		return 0;
>>  
>>  	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>> +		pcie_bus_update_set(dev);
>>  		pcie_bus_detect_mps(dev);
>>  		return 0;
>>  	}
> 
> pcie_bus_update_set() and pcie_bus_detect_mps() have a lot of
> redundancy, can't we merge this new functionality into the existing
> function? 

OK, will do.

> Also, we're in the PCIE_BUS_TUNE_OFF branch, but we seem to
> be adding code which would imply PCIE_BUS_PERFORMANCE since we're
> bringing the device up to an optimal mps to match the parent.  Is there
> a simpler solution to simply downgrade the dev_warn in
> pcie_bus_detect_mps() to dev_info and change the text?  Thanks,

We just to adjust the device's mps to make the device can work, not for an
optimal mps.

> 
> Alex
> 
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-29  8:17 [PATCH] PCI: update device mps when doing pci hotplug Yijing Wang
  2014-07-29 16:18 ` Alex Williamson
@ 2014-07-30  3:33 ` Ethan Zhao
  2014-07-30  3:42   ` Yijing Wang
  2014-07-30  6:26 ` Ethan Zhao
  2014-09-03 22:42 ` Bjorn Helgaas
  3 siblings, 1 reply; 42+ messages in thread
From: Ethan Zhao @ 2014-07-30  3:33 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci, Jordan_Hargrave, keith.busch,
	jon.mason, Jon Mason

Yijing,
    Seems you want to workaround platform bug in generic PCIe hotplug
code, while not specific to some platforms or devices, that is not
safe/fair to other vendor's platform.
and the updating to MPS of device is out of PCIe specification.
    So the best way is to fix within platform, at least, any platform
specific way in Linux code.

Thanks,
Ethan

On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> Currently we don't update device's mps value when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally. This issue was found in huawei 5885 server
> and Dell R620 server. And if we run the platform with windows,
> this problem is gone. This patch try to update the hot added
> device mps equal to its parent mps, if device mpss < parent mps,
> print warning.
>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> Reported-by: Keith Busch <keith.busch@intel.com>
> Reported-by: Jordan_Hargrave@Dell.com
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> ---
>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e3cf8a2..583ca52 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>  }
>
> +/**
> + * pcie_bus_update_set - update device mps when device doing hot-add
> + * @dev: PCI device to set
> + *
> + * After device hot add, mps will be set to default(128B), But the
> + * upstream port device's mps may be larger than 128B which was set
> + * by firmware during system bootup. Then we should update the device
> + * mps to equal to its parent mps, Or the device can not work normally.
> + */
> +static void pcie_bus_update_set(struct pci_dev *dev)
> +{
> +       int mps, p_mps, mpss;
> +       struct pci_dev *parent;
> +
> +       if (!pci_is_pcie(dev) || !dev->bus->self
> +                       || !dev->bus->self->is_hotplug_bridge)
> +               return;
> +
> +       parent = dev->bus->self;
> +       mps = pcie_get_mps(dev);
> +       p_mps = pcie_get_mps(parent);
> +
> +       if (mps >= p_mps)
> +               return;
> +
> +       mpss = 128 << dev->pcie_mpss;
> +       if (mpss < p_mps) {
> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
> +                               mpss, p_mps);
> +               return;
> +       }
> +
> +       pcie_write_mps(dev, p_mps);
> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
> +}
> +
>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>  {
>         struct pci_dev *bridge = dev->bus->self;
> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>                 return 0;
>
>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
> +               pcie_bus_update_set(dev);
>                 pcie_bus_detect_mps(dev);
>                 return 0;
>         }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-29 16:42     ` Alex Williamson
  2014-07-29 19:04       ` Keith Busch
@ 2014-07-30  3:35       ` Yijing Wang
  1 sibling, 0 replies; 42+ messages in thread
From: Yijing Wang @ 2014-07-30  3:35 UTC (permalink / raw)
  To: Alex Williamson, Keith Busch
  Cc: Bjorn Helgaas, linux-pci, Jordan_Hargrave, jon.mason, Jon Mason

On 2014/7/30 0:42, Alex Williamson wrote:
> On Tue, 2014-07-29 at 10:30 -0600, Keith Busch wrote:
>> On Tue, 29 Jul 2014, Alex Williamson wrote:
>>> On Tue, 2014-07-29 at 16:17 +0800, Yijing Wang wrote:
>>>> Currently we don't update device's mps value when doing
>>>> pci device hot-add. The hot-added device's mps will be set
>>>> to default value (128B). But the upstream port device's mps
>>>> may be larger than 128B which was set by firmware during
>>>> system bootup. In this case the new added device may not
>>>> work normally.
>>>
>>> Apologies if we rehash some previously discussed topics while I try to
>>> cover for Bjorn while he's out.  By "normally", do you mean "optimally"?
>>> The device should be functional with a lower mps setting, right?
>>
>> You'd think so, but some platforms don't work. A pci-e trace showed
>> TLPs exceeding MPS when parent device at 256B and the end device left
>> at 128B. Even if that's a platform bug, I think we still want it to work.
> 
> But if it's a platform bug for a non-compliant device, should it be
> handled as a quirk rather than standard configuration?  Thanks,

This is not a platform bug, in PCIe 3.0 spec 7.8.4, page 618, it said
Software can control this to achieve some purposes.


IMPLEMENTATION NOTE
Use of Max_Payload_Size
The Max_Payload_Size mechanism allows software to control the maximum payload in packets sent
by Endpoints to balance latency versus bandwidth trade-offs, particularly for isochronous traffic.
If software chooses to program the Max_Payload_Size of various System Elements to non-default
values, it must take care to ensure that each packet does not exceed the Max_Payload_Size
parameter of any System Element along the packet's path.  Otherwise, the packet will be rejected by  20
the System Element whose Max_Payload_Size parameter is too small.
Discussion of specific algorithms used to configure Max_Payload_Size to meet this requirement is
beyond the scope of this specification, but software should base its algorithm upon factors such as
the following:
‰  the Max_Payload_Size capability of each System Element within a hierarchy  25
‰  awareness of when System Elements are added or removed through Hot-Plug operations
‰  knowing which System Elements send packets to each other, what type of traffic is carried, what
type of transactions are used, or if packet sizes are constrained by other mechanisms

For the case of firmware that configures System Elements in preparation for running legacy
operating system environments, the firmware may need to avoid programming a Max_Payload_Size
above the default of 128 bytes, which is the minimum supported by Endpoints.
For example, if the operating system environment does not comprehend PCI Express, firmware
probably should not program a non-default Max_Payload_Size for a hierarchy that supports Hot- 5
Plug operations.  Otherwise, if no software is present to manage Max_Payload_Size settings when a
new element is added, improper operation may result.  Note that a newly added element may not
even support a Max_Payload_Size setting as large as the rest of the hierarchy, in which case software
may need to deny enabling the new element or reduce the Max_Payload_Size settings of other
elements.

> 
> Alex
> 
> 
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30  3:33 ` Ethan Zhao
@ 2014-07-30  3:42   ` Yijing Wang
  2014-07-30  3:58     ` Ethan Zhao
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2014-07-30  3:42 UTC (permalink / raw)
  To: Ethan Zhao
  Cc: Bjorn Helgaas, linux-pci, Jordan_Hargrave, keith.busch,
	jon.mason, Jon Mason

On 2014/7/30 11:33, Ethan Zhao wrote:
> Yijing,
>     Seems you want to workaround platform bug in generic PCIe hotplug
> code, while not specific to some platforms or devices, that is not
> safe/fair to other vendor's platform.
> and the updating to MPS of device is out of PCIe specification.
>     So the best way is to fix within platform, at least, any platform
> specific way in Linux code.

No, this is not a platform bug, and this is safe, you can refer to my last reply
to Alex.

> 
> Thanks,
> Ethan
> 
> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Currently we don't update device's mps value when doing
>> pci device hot-add. The hot-added device's mps will be set
>> to default value (128B). But the upstream port device's mps
>> may be larger than 128B which was set by firmware during
>> system bootup. In this case the new added device may not
>> work normally. This issue was found in huawei 5885 server
>> and Dell R620 server. And if we run the platform with windows,
>> this problem is gone. This patch try to update the hot added
>> device mps equal to its parent mps, if device mpss < parent mps,
>> print warning.
>>
>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>> Reported-by: Keith Busch <keith.busch@intel.com>
>> Reported-by: Jordan_Hargrave@Dell.com
>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Cc: Jon Mason <jdmason@kudzu.us>
>> ---
>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index e3cf8a2..583ca52 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>  }
>>
>> +/**
>> + * pcie_bus_update_set - update device mps when device doing hot-add
>> + * @dev: PCI device to set
>> + *
>> + * After device hot add, mps will be set to default(128B), But the
>> + * upstream port device's mps may be larger than 128B which was set
>> + * by firmware during system bootup. Then we should update the device
>> + * mps to equal to its parent mps, Or the device can not work normally.
>> + */
>> +static void pcie_bus_update_set(struct pci_dev *dev)
>> +{
>> +       int mps, p_mps, mpss;
>> +       struct pci_dev *parent;
>> +
>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>> +                       || !dev->bus->self->is_hotplug_bridge)
>> +               return;
>> +
>> +       parent = dev->bus->self;
>> +       mps = pcie_get_mps(dev);
>> +       p_mps = pcie_get_mps(parent);
>> +
>> +       if (mps >= p_mps)
>> +               return;
>> +
>> +       mpss = 128 << dev->pcie_mpss;
>> +       if (mpss < p_mps) {
>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>> +                               mpss, p_mps);
>> +               return;
>> +       }
>> +
>> +       pcie_write_mps(dev, p_mps);
>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>> +}
>> +
>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>  {
>>         struct pci_dev *bridge = dev->bus->self;
>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>                 return 0;
>>
>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>> +               pcie_bus_update_set(dev);
>>                 pcie_bus_detect_mps(dev);
>>                 return 0;
>>         }
>> --
>> 1.7.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30  3:42   ` Yijing Wang
@ 2014-07-30  3:58     ` Ethan Zhao
  2014-07-30  4:42       ` Yijing Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Ethan Zhao @ 2014-07-30  3:58 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci, <Jordan_Hargrave@dell.com>,
	<keith.busch@intel.com>, <jon.mason@intel.com>,
	Jon Mason



> 在 2014年7月30日,上午11:42,Yijing Wang <wangyijing@huawei.com> 写道:
> 
>> On 2014/7/30 11:33, Ethan Zhao wrote:
>> Yijing,
>>    Seems you want to workaround platform bug in generic PCIe hotplug
>> code, while not specific to some platforms or devices, that is not
>> safe/fair to other vendor's platform.
>> and the updating to MPS of device is out of PCIe specification.
>>    So the best way is to fix within platform, at least, any platform
>> specific way in Linux code.
> 
> No, this is not a platform bug, and this is safe, you can refer to my last reply
> to Alex.
if is not a platform bug, why Didn't other platforms that can do hotplug report such issue ?
If it was safe, is it possible that you have all other vendors verify it ? 
> 
>> 
>> Thanks,
>> Ethan
>> 
>>> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> Currently we don't update device's mps value when doing
>>> pci device hot-add. The hot-added device's mps will be set
>>> to default value (128B). But the upstream port device's mps
>>> may be larger than 128B which was set by firmware during
>>> system bootup. In this case the new added device may not
>>> work normally. This issue was found in huawei 5885 server
>>> and Dell R620 server. And if we run the platform with windows,
>>> this problem is gone. This patch try to update the hot added
>>> device mps equal to its parent mps, if device mpss < parent mps,
>>> print warning.
>>> 
>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>> Reported-by: Jordan_Hargrave@Dell.com
>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>> Cc: Jon Mason <jdmason@kudzu.us>
>>> ---
>>> drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>> 1 files changed, 39 insertions(+), 0 deletions(-)
>>> 
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index e3cf8a2..583ca52 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>                dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>> }
>>> 
>>> +/**
>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>> + * @dev: PCI device to set
>>> + *
>>> + * After device hot add, mps will be set to default(128B), But the
>>> + * upstream port device's mps may be larger than 128B which was set
>>> + * by firmware during system bootup. Then we should update the device
>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>> + */
>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>> +{
>>> +       int mps, p_mps, mpss;
>>> +       struct pci_dev *parent;
>>> +
>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>> +               return;
>>> +
>>> +       parent = dev->bus->self;
>>> +       mps = pcie_get_mps(dev);
>>> +       p_mps = pcie_get_mps(parent);
>>> +
>>> +       if (mps >= p_mps)
>>> +               return;
>>> +
>>> +       mpss = 128 << dev->pcie_mpss;
>>> +       if (mpss < p_mps) {
>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>> +                               mpss, p_mps);
>>> +               return;
>>> +       }
>>> +
>>> +       pcie_write_mps(dev, p_mps);
>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>> +}
>>> +
>>> static void pcie_bus_detect_mps(struct pci_dev *dev)
>>> {
>>>        struct pci_dev *bridge = dev->bus->self;
>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>                return 0;
>>> 
>>>        if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>> +               pcie_bus_update_set(dev);
>>>                pcie_bus_detect_mps(dev);
>>>                return 0;
>>>        }
>>> --
>>> 1.7.1
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> .
> 
> 
> -- 
> Thanks!
> Yijing
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30  3:58     ` Ethan Zhao
@ 2014-07-30  4:42       ` Yijing Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Yijing Wang @ 2014-07-30  4:42 UTC (permalink / raw)
  To: Ethan Zhao
  Cc: Bjorn Helgaas, linux-pci, <Jordan_Hargrave@dell.com>,
	<keith.busch@intel.com>, <jon.mason@intel.com>,
	Jon Mason

On 2014/7/30 11:58, Ethan Zhao wrote:
> 
> 
>> 在 2014年7月30日,上午11:42,Yijing Wang <wangyijing@huawei.com> 写道:
>>
>>> On 2014/7/30 11:33, Ethan Zhao wrote:
>>> Yijing,
>>>    Seems you want to workaround platform bug in generic PCIe hotplug
>>> code, while not specific to some platforms or devices, that is not
>>> safe/fair to other vendor's platform.
>>> and the updating to MPS of device is out of PCIe specification.
>>>    So the best way is to fix within platform, at least, any platform
>>> specific way in Linux code.
>>
>> No, this is not a platform bug, and this is safe, you can refer to my last reply
>> to Alex.
> if is not a platform bug, why Didn't other platforms that can do hotplug report such issue ?
> If it was safe, is it possible that you have all other vendors verify it ? 

This issue was triggered by firmware set the device mps to 256B. It's not related to platform
hardware.

I think this is safe, because I only modified the device newly hot-added, this device is not enabled yet.

Thanks!
Yijing.


>>>
>>>> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>> Currently we don't update device's mps value when doing
>>>> pci device hot-add. The hot-added device's mps will be set
>>>> to default value (128B). But the upstream port device's mps
>>>> may be larger than 128B which was set by firmware during
>>>> system bootup. In this case the new added device may not
>>>> work normally. This issue was found in huawei 5885 server
>>>> and Dell R620 server. And if we run the platform with windows,
>>>> this problem is gone. This patch try to update the hot added
>>>> device mps equal to its parent mps, if device mpss < parent mps,
>>>> print warning.
>>>>
>>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>> ---
>>>> drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>> 1 files changed, 39 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>> index e3cf8a2..583ca52 100644
>>>> --- a/drivers/pci/probe.c
>>>> +++ b/drivers/pci/probe.c
>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>                dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>> }
>>>>
>>>> +/**
>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>> + * @dev: PCI device to set
>>>> + *
>>>> + * After device hot add, mps will be set to default(128B), But the
>>>> + * upstream port device's mps may be larger than 128B which was set
>>>> + * by firmware during system bootup. Then we should update the device
>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>> + */
>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>> +{
>>>> +       int mps, p_mps, mpss;
>>>> +       struct pci_dev *parent;
>>>> +
>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>> +               return;
>>>> +
>>>> +       parent = dev->bus->self;
>>>> +       mps = pcie_get_mps(dev);
>>>> +       p_mps = pcie_get_mps(parent);
>>>> +
>>>> +       if (mps >= p_mps)
>>>> +               return;
>>>> +
>>>> +       mpss = 128 << dev->pcie_mpss;
>>>> +       if (mpss < p_mps) {
>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>> +                               mpss, p_mps);
>>>> +               return;
>>>> +       }
>>>> +
>>>> +       pcie_write_mps(dev, p_mps);
>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>> +}
>>>> +
>>>> static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>> {
>>>>        struct pci_dev *bridge = dev->bus->self;
>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>                return 0;
>>>>
>>>>        if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>> +               pcie_bus_update_set(dev);
>>>>                pcie_bus_detect_mps(dev);
>>>>                return 0;
>>>>        }
>>>> --
>>>> 1.7.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> .
>>
>>
>> -- 
>> Thanks!
>> Yijing
>>
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-29  8:17 [PATCH] PCI: update device mps when doing pci hotplug Yijing Wang
  2014-07-29 16:18 ` Alex Williamson
  2014-07-30  3:33 ` Ethan Zhao
@ 2014-07-30  6:26 ` Ethan Zhao
  2014-07-30  6:57   ` Yijing Wang
  2014-09-03 22:42 ` Bjorn Helgaas
  3 siblings, 1 reply; 42+ messages in thread
From: Ethan Zhao @ 2014-07-30  6:26 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci, Jordan Hargrave,
	<keith.busch@intel.com>, <jon.mason@intel.com>,
	Jon Mason

On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> Currently we don't update device's mps value when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally. This issue was found in huawei 5885 server
> and Dell R620 server. And if we run the platform with windows,
> this problem is gone. This patch try to update the hot added
> device mps equal to its parent mps, if device mpss < parent mps,
> print warning.
>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671

http://marc.info/?l=e1000-devel&m=134182518500774&w=2

The bug was reported by Joe.jin@oracle.com. the related hardware
is SUN FIRE X2270 M2, it is not a hotplug sever except SAS HDD,
and the the issue is definitely a BIOS bug. not related to the hotplug
code path. was fixed by BIOS.

Thanks,
Ethan

> Reported-by: Keith Busch <keith.busch@intel.com>
> Reported-by: Jordan_Hargrave@Dell.com
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> ---
>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e3cf8a2..583ca52 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>  }
>
> +/**
> + * pcie_bus_update_set - update device mps when device doing hot-add
> + * @dev: PCI device to set
> + *
> + * After device hot add, mps will be set to default(128B), But the
> + * upstream port device's mps may be larger than 128B which was set
> + * by firmware during system bootup. Then we should update the device
> + * mps to equal to its parent mps, Or the device can not work normally.
> + */
> +static void pcie_bus_update_set(struct pci_dev *dev)
> +{
> +       int mps, p_mps, mpss;
> +       struct pci_dev *parent;
> +
> +       if (!pci_is_pcie(dev) || !dev->bus->self
> +                       || !dev->bus->self->is_hotplug_bridge)
> +               return;
> +
> +       parent = dev->bus->self;
> +       mps = pcie_get_mps(dev);
> +       p_mps = pcie_get_mps(parent);
> +
> +       if (mps >= p_mps)
> +               return;
> +
> +       mpss = 128 << dev->pcie_mpss;
> +       if (mpss < p_mps) {
> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
> +                               mpss, p_mps);
> +               return;
> +       }
> +
> +       pcie_write_mps(dev, p_mps);
> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
> +}
> +
>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>  {
>         struct pci_dev *bridge = dev->bus->self;
> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>                 return 0;
>
>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
> +               pcie_bus_update_set(dev);
>                 pcie_bus_detect_mps(dev);
>                 return 0;
>         }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30  6:26 ` Ethan Zhao
@ 2014-07-30  6:57   ` Yijing Wang
  2014-07-30  7:17     ` Ethan Zhao
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2014-07-30  6:57 UTC (permalink / raw)
  To: Ethan Zhao
  Cc: Bjorn Helgaas, linux-pci, Jordan Hargrave,
	<keith.busch@intel.com>, <jon.mason@intel.com>,
	Jon Mason

On 2014/7/30 14:26, Ethan Zhao wrote:
> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Currently we don't update device's mps value when doing
>> pci device hot-add. The hot-added device's mps will be set
>> to default value (128B). But the upstream port device's mps
>> may be larger than 128B which was set by firmware during
>> system bootup. In this case the new added device may not
>> work normally. This issue was found in huawei 5885 server
>> and Dell R620 server. And if we run the platform with windows,
>> this problem is gone. This patch try to update the hot added
>> device mps equal to its parent mps, if device mpss < parent mps,
>> print warning.
>>
>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> 
> http://marc.info/?l=e1000-devel&m=134182518500774&w=2
> 
> The bug was reported by Joe.jin@oracle.com. the related hardware
> is SUN FIRE X2270 M2, it is not a hotplug sever except SAS HDD,
> and the the issue is definitely a BIOS bug. not related to the hotplug
> code path. was fixed by BIOS.
> 

Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
But that issue is not same as this one, Keith and Jordan found the issue
after hot-plug. And my patch only modify the hotplug slot connected device.

In my idea, make the device work is important, because these platforms with windows
can run happy, why linux leave this issue to BIOS.

Anyway, this is my personal opinion.

Thanks!
Yijing.

> 
>> Reported-by: Keith Busch <keith.busch@intel.com>
>> Reported-by: Jordan_Hargrave@Dell.com
>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Cc: Jon Mason <jdmason@kudzu.us>
>> ---
>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index e3cf8a2..583ca52 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>  }
>>
>> +/**
>> + * pcie_bus_update_set - update device mps when device doing hot-add
>> + * @dev: PCI device to set
>> + *
>> + * After device hot add, mps will be set to default(128B), But the
>> + * upstream port device's mps may be larger than 128B which was set
>> + * by firmware during system bootup. Then we should update the device
>> + * mps to equal to its parent mps, Or the device can not work normally.
>> + */
>> +static void pcie_bus_update_set(struct pci_dev *dev)
>> +{
>> +       int mps, p_mps, mpss;
>> +       struct pci_dev *parent;
>> +
>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>> +                       || !dev->bus->self->is_hotplug_bridge)
>> +               return;
>> +
>> +       parent = dev->bus->self;
>> +       mps = pcie_get_mps(dev);
>> +       p_mps = pcie_get_mps(parent);
>> +
>> +       if (mps >= p_mps)
>> +               return;
>> +
>> +       mpss = 128 << dev->pcie_mpss;
>> +       if (mpss < p_mps) {
>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>> +                               mpss, p_mps);
>> +               return;
>> +       }
>> +
>> +       pcie_write_mps(dev, p_mps);
>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>> +}
>> +
>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>  {
>>         struct pci_dev *bridge = dev->bus->self;
>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>                 return 0;
>>
>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>> +               pcie_bus_update_set(dev);
>>                 pcie_bus_detect_mps(dev);
>>                 return 0;
>>         }
>> --
>> 1.7.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30  6:57   ` Yijing Wang
@ 2014-07-30  7:17     ` Ethan Zhao
  2014-07-30  8:13       ` Yijing Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Ethan Zhao @ 2014-07-30  7:17 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci, Jordan Hargrave,
	<keith.busch@intel.com>, <jon.mason@intel.com>,
	Jon Mason

On Wed, Jul 30, 2014 at 2:57 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> On 2014/7/30 14:26, Ethan Zhao wrote:
>> On Tue, Jul 29, 2014 at 4:17 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> Currently we don't update device's mps value when doing
>>> pci device hot-add. The hot-added device's mps will be set
>>> to default value (128B). But the upstream port device's mps
>>> may be larger than 128B which was set by firmware during
>>> system bootup. In this case the new added device may not
>>> work normally. This issue was found in huawei 5885 server
>>> and Dell R620 server. And if we run the platform with windows,
>>> this problem is gone. This patch try to update the hot added
>>> device mps equal to its parent mps, if device mpss < parent mps,
>>> print warning.
>>>
>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
>>
>> http://marc.info/?l=e1000-devel&m=134182518500774&w=2
>>
>> The bug was reported by Joe.jin@oracle.com. the related hardware
>> is SUN FIRE X2270 M2, it is not a hotplug sever except SAS HDD,
>> and the the issue is definitely a BIOS bug. not related to the hotplug
>> code path. was fixed by BIOS.
>>
>
> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
> But that issue is not same as this one, Keith and Jordan found the issue
> after hot-plug. And my patch only modify the hotplug slot connected device.
>
> In my idea, make the device work is important, because these platforms with windows
> can run happy, why linux leave this issue to BIOS.

   That is a reason to make it works with Linux, but does your
platform have _HPX from ACPI for those hot-added back devices ?  if it
has, maybe windows could apply _HPX to configure those devices and
work well.

   Maybe you should check with _HPX .  That is a generic way to import
configuration from firmware when hot-add device.

  Please see also
  chapter 6.2.8 _HPX (Hot Plug Parameter Extensions) of ACPI spec 4&5

  Thanks,
  Ethan


>
> Anyway, this is my personal opinion.
>
> Thanks!
> Yijing.
>
>>
>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>> Reported-by: Jordan_Hargrave@Dell.com
>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>> Cc: Jon Mason <jdmason@kudzu.us>
>>> ---
>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index e3cf8a2..583ca52 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>  }
>>>
>>> +/**
>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>> + * @dev: PCI device to set
>>> + *
>>> + * After device hot add, mps will be set to default(128B), But the
>>> + * upstream port device's mps may be larger than 128B which was set
>>> + * by firmware during system bootup. Then we should update the device
>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>> + */
>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>> +{
>>> +       int mps, p_mps, mpss;
>>> +       struct pci_dev *parent;
>>> +
>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>> +               return;
>>> +
>>> +       parent = dev->bus->self;
>>> +       mps = pcie_get_mps(dev);
>>> +       p_mps = pcie_get_mps(parent);
>>> +
>>> +       if (mps >= p_mps)
>>> +               return;
>>> +
>>> +       mpss = 128 << dev->pcie_mpss;
>>> +       if (mpss < p_mps) {
>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>> +                               mpss, p_mps);
>>> +               return;
>>> +       }
>>> +
>>> +       pcie_write_mps(dev, p_mps);
>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>> +}
>>> +
>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>  {
>>>         struct pci_dev *bridge = dev->bus->self;
>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>                 return 0;
>>>
>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>> +               pcie_bus_update_set(dev);
>>>                 pcie_bus_detect_mps(dev);
>>>                 return 0;
>>>         }
>>> --
>>> 1.7.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30  7:17     ` Ethan Zhao
@ 2014-07-30  8:13       ` Yijing Wang
  2014-07-30  8:38         ` Ethan Zhao
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2014-07-30  8:13 UTC (permalink / raw)
  To: Ethan Zhao
  Cc: Bjorn Helgaas, linux-pci, Jordan Hargrave,
	<keith.busch@intel.com>,
	Jon Mason

>> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
>> But that issue is not same as this one, Keith and Jordan found the issue
>> after hot-plug. And my patch only modify the hotplug slot connected device.
>>
>> In my idea, make the device work is important, because these platforms with windows
>> can run happy, why linux leave this issue to BIOS.
> 
>    That is a reason to make it works with Linux, but does your
> platform have _HPX from ACPI for those hot-added back devices ?  if it
> has, maybe windows could apply _HPX to configure those devices and
> work well.
> 

I checked DSDT table exported from my server, but no "_HPX" found.
Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
driver won't touch ACPI methods like "_HPX".

Thanks!
Yijing.

>>
>>>
>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>> ---
>>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>> index e3cf8a2..583ca52 100644
>>>> --- a/drivers/pci/probe.c
>>>> +++ b/drivers/pci/probe.c
>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>>  }
>>>>
>>>> +/**
>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>> + * @dev: PCI device to set
>>>> + *
>>>> + * After device hot add, mps will be set to default(128B), But the
>>>> + * upstream port device's mps may be larger than 128B which was set
>>>> + * by firmware during system bootup. Then we should update the device
>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>> + */
>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>> +{
>>>> +       int mps, p_mps, mpss;
>>>> +       struct pci_dev *parent;
>>>> +
>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>> +               return;
>>>> +
>>>> +       parent = dev->bus->self;
>>>> +       mps = pcie_get_mps(dev);
>>>> +       p_mps = pcie_get_mps(parent);
>>>> +
>>>> +       if (mps >= p_mps)
>>>> +               return;
>>>> +
>>>> +       mpss = 128 << dev->pcie_mpss;
>>>> +       if (mpss < p_mps) {
>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>> +                               mpss, p_mps);
>>>> +               return;
>>>> +       }
>>>> +
>>>> +       pcie_write_mps(dev, p_mps);
>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>> +}
>>>> +
>>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>>  {
>>>>         struct pci_dev *bridge = dev->bus->self;
>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>                 return 0;
>>>>
>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>> +               pcie_bus_update_set(dev);
>>>>                 pcie_bus_detect_mps(dev);
>>>>                 return 0;
>>>>         }
>>>> --
>>>> 1.7.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> .
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30  8:13       ` Yijing Wang
@ 2014-07-30  8:38         ` Ethan Zhao
  2014-07-30  9:17           ` Yijing Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Ethan Zhao @ 2014-07-30  8:38 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci, Jordan Hargrave,
	<keith.busch@intel.com>,
	Jon Mason

On Wed, Jul 30, 2014 at 4:13 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
>>> But that issue is not same as this one, Keith and Jordan found the issue
>>> after hot-plug. And my patch only modify the hotplug slot connected device.
>>>
>>> In my idea, make the device work is important, because these platforms with windows
>>> can run happy, why linux leave this issue to BIOS.
>>
>>    That is a reason to make it works with Linux, but does your
>> platform have _HPX from ACPI for those hot-added back devices ?  if it
>> has, maybe windows could apply _HPX to configure those devices and
>> work well.
>>
>
> I checked DSDT table exported from my server, but no "_HPX" found.

You might check the wrong place...

> Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
> driver won't touch ACPI methods like "_HPX".

That is not true, please read the code in pciehp_pci.c

 pciehp_configure_device()
    pci_configure_slot()
       pci_get_hp_params()
           acpi_run_hpx()
               acpi_evaluate_object(handle, "_HPX",...)
 ...

Native pciehp also does acpi parameter importing ... ...

Thank,s
Ethan

>
> Thanks!
> Yijing.
>
>>>
>>>>
>>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>>> ---
>>>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>> index e3cf8a2..583ca52 100644
>>>>> --- a/drivers/pci/probe.c
>>>>> +++ b/drivers/pci/probe.c
>>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>>>  }
>>>>>
>>>>> +/**
>>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>>> + * @dev: PCI device to set
>>>>> + *
>>>>> + * After device hot add, mps will be set to default(128B), But the
>>>>> + * upstream port device's mps may be larger than 128B which was set
>>>>> + * by firmware during system bootup. Then we should update the device
>>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>>> + */
>>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>>> +{
>>>>> +       int mps, p_mps, mpss;
>>>>> +       struct pci_dev *parent;
>>>>> +
>>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>>> +               return;
>>>>> +
>>>>> +       parent = dev->bus->self;
>>>>> +       mps = pcie_get_mps(dev);
>>>>> +       p_mps = pcie_get_mps(parent);
>>>>> +
>>>>> +       if (mps >= p_mps)
>>>>> +               return;
>>>>> +
>>>>> +       mpss = 128 << dev->pcie_mpss;
>>>>> +       if (mpss < p_mps) {
>>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>>> +                               mpss, p_mps);
>>>>> +               return;
>>>>> +       }
>>>>> +
>>>>> +       pcie_write_mps(dev, p_mps);
>>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>>> +}
>>>>> +
>>>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>>>  {
>>>>>         struct pci_dev *bridge = dev->bus->self;
>>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>>                 return 0;
>>>>>
>>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>> +               pcie_bus_update_set(dev);
>>>>>                 pcie_bus_detect_mps(dev);
>>>>>                 return 0;
>>>>>         }
>>>>> --
>>>>> 1.7.1
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks!
>>> Yijing
>>>
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30  8:38         ` Ethan Zhao
@ 2014-07-30  9:17           ` Yijing Wang
  2014-07-30 19:41             ` Jordan_Hargrave
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2014-07-30  9:17 UTC (permalink / raw)
  To: Ethan Zhao
  Cc: Bjorn Helgaas, linux-pci, Jordan Hargrave,
	<keith.busch@intel.com>,
	Jon Mason

On 2014/7/30 16:38, Ethan Zhao wrote:
> On Wed, Jul 30, 2014 at 4:13 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
>>>> But that issue is not same as this one, Keith and Jordan found the issue
>>>> after hot-plug. And my patch only modify the hotplug slot connected device.
>>>>
>>>> In my idea, make the device work is important, because these platforms with windows
>>>> can run happy, why linux leave this issue to BIOS.
>>>
>>>    That is a reason to make it works with Linux, but does your
>>> platform have _HPX from ACPI for those hot-added back devices ?  if it
>>> has, maybe windows could apply _HPX to configure those devices and
>>> work well.
>>>
>>
>> I checked DSDT table exported from my server, but no "_HPX" found.
> 
> You might check the wrong place...
> 
>> Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
>> driver won't touch ACPI methods like "_HPX".
> 
> That is not true, please read the code in pciehp_pci.c
> 
>  pciehp_configure_device()
>     pci_configure_slot()
>        pci_get_hp_params()
>            acpi_run_hpx()
>                acpi_evaluate_object(handle, "_HPX",...)
>  ...
> 

Sorry, I make a mistake. You are right, Use the _HPX can fix the issue from BIOS.
But what confused me is why linux fail to reconfigure mps after hotplug, but windows did it well.

Maybe we can cache the device mps into pci_dev, every time we call pcie_bus_configure_settings(),
if we find the device mps is not equal to the cached value, and the device is disabled, restore it.


/* PCI Express Setting Record (Type 2) */
struct hpp_type2 {
	u32 revision;
	u32 unc_err_mask_and;
	u32 unc_err_mask_or;
	u32 unc_err_sever_and;
	u32 unc_err_sever_or;
	u32 cor_err_mask_and;
	u32 cor_err_mask_or;
	u32 adv_err_cap_and;
	u32 adv_err_cap_or;
	u16 pci_exp_devctl_and;
	u16 pci_exp_devctl_or;
	u16 pci_exp_lnkctl_and;
	u16 pci_exp_lnkctl_or;
	u32 sec_unc_err_sever_and;
	u32 sec_unc_err_sever_or;
	u32 sec_unc_err_mask_and;
	u32 sec_unc_err_mask_or;
};

> Native pciehp also does acpi parameter importing ... ...
> 
> Thank,s
> Ethan
> 
>>
>> Thanks!
>> Yijing.
>>
>>>>
>>>>>
>>>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>>>> ---
>>>>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>>> index e3cf8a2..583ca52 100644
>>>>>> --- a/drivers/pci/probe.c
>>>>>> +++ b/drivers/pci/probe.c
>>>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>>>>  }
>>>>>>
>>>>>> +/**
>>>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>>>> + * @dev: PCI device to set
>>>>>> + *
>>>>>> + * After device hot add, mps will be set to default(128B), But the
>>>>>> + * upstream port device's mps may be larger than 128B which was set
>>>>>> + * by firmware during system bootup. Then we should update the device
>>>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>>>> + */
>>>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>>>> +{
>>>>>> +       int mps, p_mps, mpss;
>>>>>> +       struct pci_dev *parent;
>>>>>> +
>>>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>>>> +               return;
>>>>>> +
>>>>>> +       parent = dev->bus->self;
>>>>>> +       mps = pcie_get_mps(dev);
>>>>>> +       p_mps = pcie_get_mps(parent);
>>>>>> +
>>>>>> +       if (mps >= p_mps)
>>>>>> +               return;
>>>>>> +
>>>>>> +       mpss = 128 << dev->pcie_mpss;
>>>>>> +       if (mpss < p_mps) {
>>>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>>>> +                               mpss, p_mps);
>>>>>> +               return;
>>>>>> +       }
>>>>>> +
>>>>>> +       pcie_write_mps(dev, p_mps);
>>>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>>>> +}
>>>>>> +
>>>>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>>>>  {
>>>>>>         struct pci_dev *bridge = dev->bus->self;
>>>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>>>                 return 0;
>>>>>>
>>>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>>> +               pcie_bus_update_set(dev);
>>>>>>                 pcie_bus_detect_mps(dev);
>>>>>>                 return 0;
>>>>>>         }
>>>>>> --
>>>>>> 1.7.1
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks!
>>>> Yijing
>>>>
>>>
>>> .
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30  9:17           ` Yijing Wang
@ 2014-07-30 19:41             ` Jordan_Hargrave
  2014-09-03 19:20               ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Jordan_Hargrave @ 2014-07-30 19:41 UTC (permalink / raw)
  To: wangyijing, ethan.kernel; +Cc: bhelgaas, linux-pci, keith.busch, jdmason

One concern with using _HPX is there doesn't appear to be a failure mechanism...
What happens if inserting a device where its maximum possible MPS is < the MPS of the parent device.

The _HPX AML would have to read the parent device MPS which would be pretty ugly.

--jordan hargrave
Dell Enterprise Linux Engineering
________________________________________
From: Yijing Wang [wangyijing@huawei.com]
Sent: Wednesday, July 30, 2014 4:17 AM
To: Ethan Zhao
Cc: Bjorn Helgaas; linux-pci; Hargrave, Jordan; <keith.busch@intel.com>; Jon Mason
Subject: Re: [PATCH] PCI: update device mps when doing pci hotplug

On 2014/7/30 16:38, Ethan Zhao wrote:
> On Wed, Jul 30, 2014 at 4:13 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
>>>> But that issue is not same as this one, Keith and Jordan found the issue
>>>> after hot-plug. And my patch only modify the hotplug slot connected device.
>>>>
>>>> In my idea, make the device work is important, because these platforms with windows
>>>> can run happy, why linux leave this issue to BIOS.
>>>
>>>    That is a reason to make it works with Linux, but does your
>>> platform have _HPX from ACPI for those hot-added back devices ?  if it
>>> has, maybe windows could apply _HPX to configure those devices and
>>> work well.
>>>
>>
>> I checked DSDT table exported from my server, but no "_HPX" found.
>
> You might check the wrong place...
>
>> Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
>> driver won't touch ACPI methods like "_HPX".
>
> That is not true, please read the code in pciehp_pci.c
>
>  pciehp_configure_device()
>     pci_configure_slot()
>        pci_get_hp_params()
>            acpi_run_hpx()
>                acpi_evaluate_object(handle, "_HPX",...)
>  ...
>

Sorry, I make a mistake. You are right, Use the _HPX can fix the issue from BIOS.
But what confused me is why linux fail to reconfigure mps after hotplug, but windows did it well.

Maybe we can cache the device mps into pci_dev, every time we call pcie_bus_configure_settings(),
if we find the device mps is not equal to the cached value, and the device is disabled, restore it.


/* PCI Express Setting Record (Type 2) */
struct hpp_type2 {
        u32 revision;
        u32 unc_err_mask_and;
        u32 unc_err_mask_or;
        u32 unc_err_sever_and;
        u32 unc_err_sever_or;
        u32 cor_err_mask_and;
        u32 cor_err_mask_or;
        u32 adv_err_cap_and;
        u32 adv_err_cap_or;
        u16 pci_exp_devctl_and;
        u16 pci_exp_devctl_or;
        u16 pci_exp_lnkctl_and;
        u16 pci_exp_lnkctl_or;
        u32 sec_unc_err_sever_and;
        u32 sec_unc_err_sever_or;
        u32 sec_unc_err_mask_and;
        u32 sec_unc_err_mask_or;
};

> Native pciehp also does acpi parameter importing ... ...
>
> Thank,s
> Ethan
>
>>
>> Thanks!
>> Yijing.
>>
>>>>
>>>>>
>>>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>>>> ---
>>>>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>>> index e3cf8a2..583ca52 100644
>>>>>> --- a/drivers/pci/probe.c
>>>>>> +++ b/drivers/pci/probe.c
>>>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>>>>  }
>>>>>>
>>>>>> +/**
>>>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>>>> + * @dev: PCI device to set
>>>>>> + *
>>>>>> + * After device hot add, mps will be set to default(128B), But the
>>>>>> + * upstream port device's mps may be larger than 128B which was set
>>>>>> + * by firmware during system bootup. Then we should update the device
>>>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>>>> + */
>>>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>>>> +{
>>>>>> +       int mps, p_mps, mpss;
>>>>>> +       struct pci_dev *parent;
>>>>>> +
>>>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>>>> +               return;
>>>>>> +
>>>>>> +       parent = dev->bus->self;
>>>>>> +       mps = pcie_get_mps(dev);
>>>>>> +       p_mps = pcie_get_mps(parent);
>>>>>> +
>>>>>> +       if (mps >= p_mps)
>>>>>> +               return;
>>>>>> +
>>>>>> +       mpss = 128 << dev->pcie_mpss;
>>>>>> +       if (mpss < p_mps) {
>>>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>>>> +                               mpss, p_mps);
>>>>>> +               return;
>>>>>> +       }
>>>>>> +
>>>>>> +       pcie_write_mps(dev, p_mps);
>>>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>>>> +}
>>>>>> +
>>>>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>>>>  {
>>>>>>         struct pci_dev *bridge = dev->bus->self;
>>>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>>>                 return 0;
>>>>>>
>>>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>>> +               pcie_bus_update_set(dev);
>>>>>>                 pcie_bus_detect_mps(dev);
>>>>>>                 return 0;
>>>>>>         }
>>>>>> --
>>>>>> 1.7.1
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks!
>>>> Yijing
>>>>
>>>
>>> .
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
>
> .
>


--
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-30 19:41             ` Jordan_Hargrave
@ 2014-09-03 19:20               ` Bjorn Helgaas
  0 siblings, 0 replies; 42+ messages in thread
From: Bjorn Helgaas @ 2014-09-03 19:20 UTC (permalink / raw)
  To: Jordan Hargrave
  Cc: Yijing Wang, Ethan Zhao, linux-pci, Keith Busch, Jon Mason

On Wed, Jul 30, 2014 at 1:41 PM,  <Jordan_Hargrave@dell.com> wrote:
> One concern with using _HPX is there doesn't appear to be a failure mechanism...
> What happens if inserting a device where its maximum possible MPS is < the MPS of the parent device.
>
> The _HPX AML would have to read the parent device MPS which would be pretty ugly.

_HPX should be applied when a device is added via any hotplug driver,
including pciehp, but I don't think it is a reasonable way to fix
this.  The spec (ACPI r5.0, sec 6.2.8.3) allows a Type 2 record to set
arbitrary values in the Device Control register (including MPS), and
that's what the current Linux implementation does.

However, the spec allows OSPM to override any bits deemed necessary.
MPS is a property that is controlled by OSPM and it depends on other
devices in the system, so I don't think it is safe to put an MPS value
from _HPX directly into Device Control.  OSPM should first ensure that
the value is compatible with how it has configured the rest of the
system.

> ________________________________________
> From: Yijing Wang [wangyijing@huawei.com]
> Sent: Wednesday, July 30, 2014 4:17 AM
> To: Ethan Zhao
> Cc: Bjorn Helgaas; linux-pci; Hargrave, Jordan; <keith.busch@intel.com>; Jon Mason
> Subject: Re: [PATCH] PCI: update device mps when doing pci hotplug
>
> On 2014/7/30 16:38, Ethan Zhao wrote:
>> On Wed, Jul 30, 2014 at 4:13 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>>> Yes, that issue is BIOS bug, the mps setting is wrong after system boot up.
>>>>> But that issue is not same as this one, Keith and Jordan found the issue
>>>>> after hot-plug. And my patch only modify the hotplug slot connected device.
>>>>>
>>>>> In my idea, make the device work is important, because these platforms with windows
>>>>> can run happy, why linux leave this issue to BIOS.
>>>>
>>>>    That is a reason to make it works with Linux, but does your
>>>> platform have _HPX from ACPI for those hot-added back devices ?  if it
>>>> has, maybe windows could apply _HPX to configure those devices and
>>>> work well.
>>>>
>>>
>>> I checked DSDT table exported from my server, but no "_HPX" found.
>>
>> You might check the wrong place...
>>
>>> Further more, kernel use pciehp first to support pcie hotplug device. And in pciehp,
>>> driver won't touch ACPI methods like "_HPX".
>>
>> That is not true, please read the code in pciehp_pci.c
>>
>>  pciehp_configure_device()
>>     pci_configure_slot()
>>        pci_get_hp_params()
>>            acpi_run_hpx()
>>                acpi_evaluate_object(handle, "_HPX",...)
>>  ...
>>
>
> Sorry, I make a mistake. You are right, Use the _HPX can fix the issue from BIOS.
> But what confused me is why linux fail to reconfigure mps after hotplug, but windows did it well.
>
> Maybe we can cache the device mps into pci_dev, every time we call pcie_bus_configure_settings(),
> if we find the device mps is not equal to the cached value, and the device is disabled, restore it.
>
>
> /* PCI Express Setting Record (Type 2) */
> struct hpp_type2 {
>         u32 revision;
>         u32 unc_err_mask_and;
>         u32 unc_err_mask_or;
>         u32 unc_err_sever_and;
>         u32 unc_err_sever_or;
>         u32 cor_err_mask_and;
>         u32 cor_err_mask_or;
>         u32 adv_err_cap_and;
>         u32 adv_err_cap_or;
>         u16 pci_exp_devctl_and;
>         u16 pci_exp_devctl_or;
>         u16 pci_exp_lnkctl_and;
>         u16 pci_exp_lnkctl_or;
>         u32 sec_unc_err_sever_and;
>         u32 sec_unc_err_sever_or;
>         u32 sec_unc_err_mask_and;
>         u32 sec_unc_err_mask_or;
> };
>
>> Native pciehp also does acpi parameter importing ... ...
>>
>> Thank,s
>> Ethan
>>
>>>
>>> Thanks!
>>> Yijing.
>>>
>>>>>
>>>>>>
>>>>>>> Reported-by: Keith Busch <keith.busch@intel.com>
>>>>>>> Reported-by: Jordan_Hargrave@Dell.com
>>>>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>>>>> ---
>>>>>>>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>>>>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>>>> index e3cf8a2..583ca52 100644
>>>>>>> --- a/drivers/pci/probe.c
>>>>>>> +++ b/drivers/pci/probe.c
>>>>>>> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>>>>>>>                 dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>>>>>>>  }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>>>>>> + * @dev: PCI device to set
>>>>>>> + *
>>>>>>> + * After device hot add, mps will be set to default(128B), But the
>>>>>>> + * upstream port device's mps may be larger than 128B which was set
>>>>>>> + * by firmware during system bootup. Then we should update the device
>>>>>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>>>>>> + */
>>>>>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>>>>>> +{
>>>>>>> +       int mps, p_mps, mpss;
>>>>>>> +       struct pci_dev *parent;
>>>>>>> +
>>>>>>> +       if (!pci_is_pcie(dev) || !dev->bus->self
>>>>>>> +                       || !dev->bus->self->is_hotplug_bridge)
>>>>>>> +               return;
>>>>>>> +
>>>>>>> +       parent = dev->bus->self;
>>>>>>> +       mps = pcie_get_mps(dev);
>>>>>>> +       p_mps = pcie_get_mps(parent);
>>>>>>> +
>>>>>>> +       if (mps >= p_mps)
>>>>>>> +               return;
>>>>>>> +
>>>>>>> +       mpss = 128 << dev->pcie_mpss;
>>>>>>> +       if (mpss < p_mps) {
>>>>>>> +               dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>>>>>> +                               "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>>>>>> +                               mpss, p_mps);
>>>>>>> +               return;
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       pcie_write_mps(dev, p_mps);
>>>>>>> +       dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>>>>>> +                       pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>>>>>> +}
>>>>>>> +
>>>>>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>>>>>  {
>>>>>>>         struct pci_dev *bridge = dev->bus->self;
>>>>>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>>>>                 return 0;
>>>>>>>
>>>>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>>>> +               pcie_bus_update_set(dev);
>>>>>>>                 pcie_bus_detect_mps(dev);
>>>>>>>                 return 0;
>>>>>>>         }
>>>>>>> --
>>>>>>> 1.7.1
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks!
>>>>> Yijing
>>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks!
>>> Yijing
>>>
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-07-29  8:17 [PATCH] PCI: update device mps when doing pci hotplug Yijing Wang
                   ` (2 preceding siblings ...)
  2014-07-30  6:26 ` Ethan Zhao
@ 2014-09-03 22:42 ` Bjorn Helgaas
  2014-09-04  6:12   ` Yijing Wang
  3 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2014-09-03 22:42 UTC (permalink / raw)
  To: Yijing Wang; +Cc: linux-pci, Jordan_Hargrave, keith.busch, jon.mason, Jon Mason

On Tue, Jul 29, 2014 at 04:17:57PM +0800, Yijing Wang wrote:
> Currently we don't update device's mps value when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally. This issue was found in huawei 5885 server
> and Dell R620 server. And if we run the platform with windows,
> this problem is gone. This patch try to update the hot added
> device mps equal to its parent mps, if device mpss < parent mps,
> print warning.
> 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> Reported-by: Keith Busch <keith.busch@intel.com>
> Reported-by: Jordan_Hargrave@Dell.com
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> ---
>  drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e3cf8a2..583ca52 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
>  		dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
>  }
>  
> +/**
> + * pcie_bus_update_set - update device mps when device doing hot-add
> + * @dev: PCI device to set
> + * 
> + * After device hot add, mps will be set to default(128B), But the 
> + * upstream port device's mps may be larger than 128B which was set 
> + * by firmware during system bootup. Then we should update the device
> + * mps to equal to its parent mps, Or the device can not work normally.
> + */
> +static void pcie_bus_update_set(struct pci_dev *dev)
> +{
> +	int mps, p_mps, mpss;
> +	struct pci_dev *parent;
> +
> +	if (!pci_is_pcie(dev) || !dev->bus->self 
> +			|| !dev->bus->self->is_hotplug_bridge)

Part of this looks redundant because pcie_bus_configure_set() already
checks pci_is_pcie().  And I don't know why we need to test
is_hotplug_bridge here; MPS settings need to be consistent regardless of
whether the upstream bridge supports hotplug.

> +		return;
> +	
> +	parent = dev->bus->self;
> +	mps = pcie_get_mps(dev);
> +	p_mps = pcie_get_mps(parent);
> +
> +	if (mps >= p_mps)
> +		return;
> +
> +	mpss = 128 << dev->pcie_mpss;
> +	if (mpss < p_mps) {
> +		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
> +				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
> +				mpss, p_mps);
> +		return;

Since we can't configure the new device correctly, we really shouldn't
allow a driver to bind to it.  The current design doesn't have much
provision for doing that, so warning is probably all we can do.

> +	}
> +
> +	pcie_write_mps(dev, p_mps);
> +	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
> +			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
> +}
> +
>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>  {
>  	struct pci_dev *bridge = dev->bus->self;
> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>  		return 0;
>  
>  	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
> +		pcie_bus_update_set(dev);

You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
problem occur for other pcie_bus_config settings?

>  		pcie_bus_detect_mps(dev);
>  		return 0;
>  	}

I have some long-term ideas here (below), but to make progress in the short
term, I think we just need to make sure this handles all pcie_bus_config
settings.

Bjorn



Stepping back a long ways, I think the current design is hard to use.
It's set up with the idea that we (1) enumerate all the devices in the
system, and then (2) configure MPS for everything all at once.

That's not a very good fit when we start hotplugging devices, and it's
part of the reason MPS configuration is not well integrated into the PCI
core and doesn't get done at all for most architectures.

What I'd prefer is something that could be done in the core as each device
is enumerated, e.g., in or near pci_device_add().  I know there's tension
between the need to do this before drivers bind to the device and the
desire to enumerate the whole hierarchy before committing to MPS settings.
But we need to handle that tension anyway for hot-added devices, so we
might as well deal with it at boot-time and use the same code path for
both boot-time and hot-add time.

I have in mind something like this:

  pcie_configure_mps(struct pci_dev *dev)
  {
    int ret;

    if (!pci_is_pci(dev))
      return;

    if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
      /* set my MPS to dev->pcie_mpss (max supported size) */
      return;
    }

    if (dev->pcie_mpss >= upstream bridge MPS) {
      /* set my MPS to upstream bridge MPS */
      return;
    }

    ret = pcie_set_hierarchy_mps(pcie_root_port(dev), dev->mpss);
    if (ret == failure)
      /* emit warning, can't enable this device */
  }

  struct pci_dev *pcie_root_port(struct pci_dev *dev)
  {
    if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
      return dev;

    return pcie_root_port(dev->bus->self);
  }

  pcie_set_hierarchy_mps(struct pci_dev *root, int mpss)
  {
    struct pci_bus *secondary;
    struct pci_dev *dev;
    int ret;

    if (root->driver)
      return -EINVAL;

    secondary = root->subordinate;
    if (secondary) {
      list_for_each_entry(dev, &secondary->devices, bus_list) {
	ret = pcie_set_hierarchy(dev, mpss);
	if (ret)
	  return ret;
      }
    }

    /* set my MPS to mpss */
    return 0;
  }

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-03 22:42 ` Bjorn Helgaas
@ 2014-09-04  6:12   ` Yijing Wang
  2014-09-04 13:16     ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2014-09-04  6:12 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Jordan_Hargrave, keith.busch, jon.mason, Jon Mason

>> + * pcie_bus_update_set - update device mps when device doing hot-add
>> + * @dev: PCI device to set
>> + * 
>> + * After device hot add, mps will be set to default(128B), But the 
>> + * upstream port device's mps may be larger than 128B which was set 
>> + * by firmware during system bootup. Then we should update the device
>> + * mps to equal to its parent mps, Or the device can not work normally.
>> + */
>> +static void pcie_bus_update_set(struct pci_dev *dev)
>> +{
>> +	int mps, p_mps, mpss;
>> +	struct pci_dev *parent;
>> +
>> +	if (!pci_is_pcie(dev) || !dev->bus->self 
>> +			|| !dev->bus->self->is_hotplug_bridge)
> 
> Part of this looks redundant because pcie_bus_configure_set() already
> checks pci_is_pcie().  And I don't know why we need to test
> is_hotplug_bridge here; MPS settings need to be consistent regardless of
> whether the upstream bridge supports hotplug.

Hi Bjorn, I added is_hotplug_bridge() here is mainly to touch the hotplug case only.
It was more like a temporary solution and not perfect one.

> 
>> +		return;
>> +	
>> +	parent = dev->bus->self;
>> +	mps = pcie_get_mps(dev);
>> +	p_mps = pcie_get_mps(parent);
>> +
>> +	if (mps >= p_mps)
>> +		return;
>> +
>> +	mpss = 128 << dev->pcie_mpss;
>> +	if (mpss < p_mps) {
>> +		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>> +				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>> +				mpss, p_mps);
>> +		return;
> 
> Since we can't configure the new device correctly, we really shouldn't
> allow a driver to bind to it.  The current design doesn't have much
> provision for doing that, so warning is probably all we can do.

Yes, bind a driver to the device which mps is not correctly set will cause another problem.

> 
>> +	}
>> +
>> +	pcie_write_mps(dev, p_mps);
>> +	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
>> +			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>> +}
>> +
>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>  {
>>  	struct pci_dev *bridge = dev->bus->self;
>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>  		return 0;
>>  
>>  	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>> +		pcie_bus_update_set(dev);
> 
> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
> problem occur for other pcie_bus_config settings?

We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
This issue won't happen.

> 
>>  		pcie_bus_detect_mps(dev);
>>  		return 0;
>>  	}
> 
> I have some long-term ideas here (below), but to make progress in the short
> term, I think we just need to make sure this handles all pcie_bus_config
> settings.
> 
> Bjorn
> 
> 
> 
> Stepping back a long ways, I think the current design is hard to use.
> It's set up with the idea that we (1) enumerate all the devices in the
> system, and then (2) configure MPS for everything all at once.
> 
> That's not a very good fit when we start hotplugging devices, and it's
> part of the reason MPS configuration is not well integrated into the PCI
> core and doesn't get done at all for most architectures.

Agree, arch code should not be involved the MPS setting. It's arch independent.

> 
> What I'd prefer is something that could be done in the core as each device
> is enumerated, e.g., in or near pci_device_add().  I know there's tension
> between the need to do this before drivers bind to the device and the
> desire to enumerate the whole hierarchy before committing to MPS settings.
> But we need to handle that tension anyway for hot-added devices, so we
> might as well deal with it at boot-time and use the same code path for
> both boot-time and hot-add time.
> 
> I have in mind something like this:
> 
>   pcie_configure_mps(struct pci_dev *dev)
>   {
>     int ret;
> 
>     if (!pci_is_pci(dev))
>       return;
> 
>     if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) {
>       /* set my MPS to dev->pcie_mpss (max supported size) */
>       return;
>     }
> 
>     if (dev->pcie_mpss >= upstream bridge MPS) {
>       /* set my MPS to upstream bridge MPS */
>       return;
>     }
> 
>     ret = pcie_set_hierarchy_mps(pcie_root_port(dev), dev->mpss);
>     if (ret == failure)
>       /* emit warning, can't enable this device */

If got failure here, should roll back ? What about set hierarchy mps in reverse order(down to top).

>   }
> 
>   struct pci_dev *pcie_root_port(struct pci_dev *dev)
>   {
>     if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
>       return dev;
> 
>     return pcie_root_port(dev->bus->self);
>   }
> 
>   pcie_set_hierarchy_mps(struct pci_dev *root, int mpss)
>   {
>     struct pci_bus *secondary;
>     struct pci_dev *dev;
>     int ret;
> 
>     if (root->driver)
>       return -EINVAL;

Maybe it's not safe enough, change device's mps has risk unless all its children devices have no driver bound(disabled).
A root port may has no pcieport driver bound, if pcieport driver probe failed. But its children device can work normally.

> 
>     secondary = root->subordinate;
>     if (secondary) {
>       list_for_each_entry(dev, &secondary->devices, bus_list) {
> 	ret = pcie_set_hierarchy(dev, mpss);
> 	if (ret)
> 	  return ret;
>       }
>     }
> 
>     /* set my MPS to mpss */
>     return 0;
>   }





> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-04  6:12   ` Yijing Wang
@ 2014-09-04 13:16     ` Bjorn Helgaas
  2014-09-05  1:27       ` Yijing Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2014-09-04 13:16 UTC (permalink / raw)
  To: Yijing Wang; +Cc: linux-pci, Jordan Hargrave, Keith Busch, Jon Mason, Jon Mason

On Thu, Sep 4, 2014 at 12:12 AM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> + * pcie_bus_update_set - update device mps when device doing hot-add
>>> + * @dev: PCI device to set
>>> + *
>>> + * After device hot add, mps will be set to default(128B), But the
>>> + * upstream port device's mps may be larger than 128B which was set
>>> + * by firmware during system bootup. Then we should update the device
>>> + * mps to equal to its parent mps, Or the device can not work normally.
>>> + */
>>> +static void pcie_bus_update_set(struct pci_dev *dev)
>>> +{
>>> +    int mps, p_mps, mpss;
>>> +    struct pci_dev *parent;
>>> +
>>> +    if (!pci_is_pcie(dev) || !dev->bus->self
>>> +                    || !dev->bus->self->is_hotplug_bridge)
>>
>> Part of this looks redundant because pcie_bus_configure_set() already
>> checks pci_is_pcie().  And I don't know why we need to test
>> is_hotplug_bridge here; MPS settings need to be consistent regardless of
>> whether the upstream bridge supports hotplug.
>
> Hi Bjorn, I added is_hotplug_bridge() here is mainly to touch the hotplug case only.
> It was more like a temporary solution and not perfect one.
>
>>
>>> +            return;
>>> +
>>> +    parent = dev->bus->self;
>>> +    mps = pcie_get_mps(dev);
>>> +    p_mps = pcie_get_mps(parent);
>>> +
>>> +    if (mps >= p_mps)
>>> +            return;
>>> +
>>> +    mpss = 128 << dev->pcie_mpss;
>>> +    if (mpss < p_mps) {
>>> +            dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
>>> +                            "If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
>>> +                            mpss, p_mps);
>>> +            return;
>>
>> Since we can't configure the new device correctly, we really shouldn't
>> allow a driver to bind to it.  The current design doesn't have much
>> provision for doing that, so warning is probably all we can do.
>
> Yes, bind a driver to the device which mps is not correctly set will cause another problem.
>
>>
>>> +    }
>>> +
>>> +    pcie_write_mps(dev, p_mps);
>>> +    dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>>> +                    pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>>> +}
>>> +
>>>  static void pcie_bus_detect_mps(struct pci_dev *dev)
>>>  {
>>>      struct pci_dev *bridge = dev->bus->self;
>>> @@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>              return 0;
>>>
>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>> +            pcie_bus_update_set(dev);
>>
>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>> problem occur for other pcie_bus_config settings?
>
> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
> This issue won't happen.

Sorry, I can't parse this.  Are you saying the problem won't happen in
the other modes?  Why not?

>>>              pcie_bus_detect_mps(dev);
>>>              return 0;
>>>      }

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-04 13:16     ` Bjorn Helgaas
@ 2014-09-05  1:27       ` Yijing Wang
  2014-09-05 14:37         ` Keith Busch
  2014-09-24 22:41         ` Keith Busch
  0 siblings, 2 replies; 42+ messages in thread
From: Yijing Wang @ 2014-09-05  1:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Jordan Hargrave, Keith Busch, Jon Mason, Jon Mason

>>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>> +            pcie_bus_update_set(dev);
>>>
>>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>>> problem occur for other pcie_bus_config settings?
>>
>> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
>> This issue won't happen.
> 
> Sorry, I can't parse this.  Are you saying the problem won't happen in
> the other modes?  Why not?

Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the largest available mpss in a pcie path.
Then call pcie_bus_configure_set() to set all devices' mps to the largest available mps in this path, so
all devices in the path will have the same mps. When in PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in pcie_write_mps(),

			/* For "Performance", the assumption is made that
			 * downstream communication will never be larger than
			 * the MRRS.  So, the MPS only needs to be configured
			 * for the upstream communication.  This being the case, <------
 			 * walk from the top down and set the MPS of the child
			 * to that of the parent bus.

So I think the problem won't happen in other modes.

Thanks!
Yijing.


> 
>>>>              pcie_bus_detect_mps(dev);
>>>>              return 0;
>>>>      }
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-05  1:27       ` Yijing Wang
@ 2014-09-05 14:37         ` Keith Busch
  2014-09-24 22:41         ` Keith Busch
  1 sibling, 0 replies; 42+ messages in thread
From: Keith Busch @ 2014-09-05 14:37 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci, Jordan Hargrave, Keith Busch,
	Jon Mason, Jon Mason

On Thu, 4 Sep 2014, Yijing Wang wrote:
> So I think the problem won't happen in other modes.

Yes, you're right; we currently recommend end-users set one of the other
bus tuning options because the problem doesn't happen there.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-05  1:27       ` Yijing Wang
  2014-09-05 14:37         ` Keith Busch
@ 2014-09-24 22:41         ` Keith Busch
  2014-09-24 23:30           ` Bjorn Helgaas
  1 sibling, 1 reply; 42+ messages in thread
From: Keith Busch @ 2014-09-24 22:41 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci, Jordan Hargrave, Keith Busch,
	Jon Mason, Jon Mason

Just poking this thread to make sure it's not dead. :)

I tested Yijing's proposal and it is successful on our Intel server
platforms; hoping either this or something that derives similar behavior
will be applied so we can remove bus tuning kernel parameters.

Tested-by: Keith Busch <keith.busch@intel.com>

On Thu, 4 Sep 2014, Yijing Wang wrote:
>>>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>> +            pcie_bus_update_set(dev);
>>>>
>>>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>>>> problem occur for other pcie_bus_config settings?
>>>
>>> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
>>> This issue won't happen.
>>
>> Sorry, I can't parse this.  Are you saying the problem won't happen in
>> the other modes?  Why not?
>
> Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the largest available mpss in a pcie path.
> Then call pcie_bus_configure_set() to set all devices' mps to the largest available mps in this path, so
> all devices in the path will have the same mps. When in PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
> for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in pcie_write_mps(),
>
> 			/* For "Performance", the assumption is made that
> 			 * downstream communication will never be larger than
> 			 * the MRRS.  So, the MPS only needs to be configured
> 			 * for the upstream communication.  This being the case, <------
> 			 * walk from the top down and set the MPS of the child
> 			 * to that of the parent bus.
>
> So I think the problem won't happen in other modes.
>
> Thanks!
> Yijing.
>
>
>>
>>>>>              pcie_bus_detect_mps(dev);
>>>>>              return 0;
>>>>>      }
>>
>> .
>>
>
>
> -- 
> Thanks!
> Yijing
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-24 22:41         ` Keith Busch
@ 2014-09-24 23:30           ` Bjorn Helgaas
  2014-09-25  1:23             ` Yijing Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2014-09-24 23:30 UTC (permalink / raw)
  To: Keith Busch; +Cc: Yijing Wang, linux-pci, Jordan Hargrave, Jon Mason, Jon Mason

On Wed, Sep 24, 2014 at 4:41 PM, Keith Busch <keith.busch@intel.com> wrote:
> Just poking this thread to make sure it's not dead. :)
>
> I tested Yijing's proposal and it is successful on our Intel server
> platforms; hoping either this or something that derives similar behavior
> will be applied so we can remove bus tuning kernel parameters.
>
> Tested-by: Keith Busch <keith.busch@intel.com>

Oops, thanks for poking me, because this was indeed dead.

My main objection was to testing "is_hotplug_bridge".  That doesn't
seem right, because this issue really isn't specific to hotplug.  I
didn't see a resolution of that, but let me know if I missed it.

Bjorn

> On Thu, 4 Sep 2014, Yijing Wang wrote:
>>>>>>
>>>>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>>> +            pcie_bus_update_set(dev);
>>>>>
>>>>>
>>>>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>>>>> problem occur for other pcie_bus_config settings?
>>>>
>>>>
>>>> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like
>>>> PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
>>>> This issue won't happen.
>>>
>>>
>>> Sorry, I can't parse this.  Are you saying the problem won't happen in
>>> the other modes?  Why not?
>>
>>
>> Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the
>> largest available mpss in a pcie path.
>> Then call pcie_bus_configure_set() to set all devices' mps to the largest
>> available mps in this path, so
>> all devices in the path will have the same mps. When in
>> PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
>> for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in
>> pcie_write_mps(),
>>
>>                         /* For "Performance", the assumption is made that
>>                          * downstream communication will never be larger
>> than
>>                          * the MRRS.  So, the MPS only needs to be
>> configured
>>                          * for the upstream communication.  This being the
>> case, <------
>>                          * walk from the top down and set the MPS of the
>> child
>>                          * to that of the parent bus.
>>
>> So I think the problem won't happen in other modes.
>>
>> Thanks!
>> Yijing.
>>
>>
>>>
>>>>>>              pcie_bus_detect_mps(dev);
>>>>>>              return 0;
>>>>>>      }
>>>
>>>
>>> .
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-24 23:30           ` Bjorn Helgaas
@ 2014-09-25  1:23             ` Yijing Wang
  2014-09-25 16:46               ` Keith Busch
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2014-09-25  1:23 UTC (permalink / raw)
  To: Bjorn Helgaas, Keith Busch
  Cc: linux-pci, Jordan Hargrave, Jon Mason, Jon Mason

On 2014/9/25 7:30, Bjorn Helgaas wrote:
> On Wed, Sep 24, 2014 at 4:41 PM, Keith Busch <keith.busch@intel.com> wrote:
>> Just poking this thread to make sure it's not dead. :)
>>
>> I tested Yijing's proposal and it is successful on our Intel server
>> platforms; hoping either this or something that derives similar behavior
>> will be applied so we can remove bus tuning kernel parameters.
>>
>> Tested-by: Keith Busch <keith.busch@intel.com>
> 
> Oops, thanks for poking me, because this was indeed dead.
> 
> My main objection was to testing "is_hotplug_bridge".  That doesn't
> seem right, because this issue really isn't specific to hotplug.  I
> didn't see a resolution of that, but let me know if I missed it.

Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.

It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.

I'd like to refactor current MPS framework, but now there are still some puzzles
to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
log from git, I found he turn off all this MPS config, because some issues were found
in some platforms, but no platforms detailed info and no bugzilla records.


> 
> Bjorn
> 
>> On Thu, 4 Sep 2014, Yijing Wang wrote:
>>>>>>>
>>>>>>>      if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
>>>>>>> +            pcie_bus_update_set(dev);
>>>>>>
>>>>>>
>>>>>> You're only adding this to the PCIE_BUS_TUNE_OFF path.  Can't the same
>>>>>> problem occur for other pcie_bus_config settings?
>>>>>
>>>>>
>>>>> We only found the problem during PCIE_BUS_TUNE_OFF set. Other mode like
>>>>> PCIE_BUS_SAFE and PCIE_BUS_PEER2PEER.
>>>>> This issue won't happen.
>>>>
>>>>
>>>> Sorry, I can't parse this.  Are you saying the problem won't happen in
>>>> the other modes?  Why not?
>>>
>>>
>>> Hi Bjorn, when in PCIE_BUS_SAFE mode, pcie_find_smpss() will find the
>>> largest available mpss in a pcie path.
>>> Then call pcie_bus_configure_set() to set all devices' mps to the largest
>>> available mps in this path, so
>>> all devices in the path will have the same mps. When in
>>> PCIE_BUS_PEER2PEER, all devices' mps will be set to 128B
>>> for safety. And to the PCIE_BUS_PERFORMANCE mode, I found Jon's comment in
>>> pcie_write_mps(),
>>>
>>>                         /* For "Performance", the assumption is made that
>>>                          * downstream communication will never be larger
>>> than
>>>                          * the MRRS.  So, the MPS only needs to be
>>> configured
>>>                          * for the upstream communication.  This being the
>>> case, <------
>>>                          * walk from the top down and set the MPS of the
>>> child
>>>                          * to that of the parent bus.
>>>
>>> So I think the problem won't happen in other modes.
>>>
>>> Thanks!
>>> Yijing.
>>>
>>>
>>>>
>>>>>>>              pcie_bus_detect_mps(dev);
>>>>>>>              return 0;
>>>>>>>      }
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks!
>>> Yijing
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-25  1:23             ` Yijing Wang
@ 2014-09-25 16:46               ` Keith Busch
  2014-09-26  3:22                 ` Yijing Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Keith Busch @ 2014-09-25 16:46 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, Keith Busch, linux-pci, Jordan Hargrave, Jon Mason

On Wed, 24 Sep 2014, Yijing Wang wrote:
> On 2014/9/25 7:30, Bjorn Helgaas wrote:
>> On Wed, Sep 24, 2014 at 4:41 PM, Keith Busch <keith.busch@intel.com> wrote:
>>> Just poking this thread to make sure it's not dead. :)
>>>
>>> I tested Yijing's proposal and it is successful on our Intel server
>>> platforms; hoping either this or something that derives similar behavior
>>> will be applied so we can remove bus tuning kernel parameters.
>>>
>>> Tested-by: Keith Busch <keith.busch@intel.com>
>>
>> Oops, thanks for poking me, because this was indeed dead.
>>
>> My main objection was to testing "is_hotplug_bridge".  That doesn't
>> seem right, because this issue really isn't specific to hotplug.  I
>> didn't see a resolution of that, but let me know if I missed it.
>
> Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
> in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.
>
> It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.
>
> I'd like to refactor current MPS framework, but now there are still some puzzles
> to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
> log from git, I found he turn off all this MPS config, because some issues were found
> in some platforms, but no platforms detailed info and no bugzilla records.

Just my opinion, I thought the hotplug check was a good idea: it addresses
a known issue, and does not mess with current unknowns. Outside a hotplug
scenario, I think we expect platform f/w to handle MPS settings and the
kernel can stay out of the way because of the unknown platform issues. If
it is hotplug, having the kernel set device's MPS to match the parent
couldn't make things worse off than doing nothing, right?

On the side, I'll see if I can ping some comrades on PCI-SIG to propose
an ECN to clarify configuring MPS. They usually ignore me though, so no
promises. :)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-25 16:46               ` Keith Busch
@ 2014-09-26  3:22                 ` Yijing Wang
  2014-10-02 15:31                   ` Jordan_Hargrave
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2014-09-26  3:22 UTC (permalink / raw)
  To: Keith Busch; +Cc: Bjorn Helgaas, linux-pci, Jordan Hargrave, Jon Mason

On 2014/9/26 0:46, Keith Busch wrote:
> On Wed, 24 Sep 2014, Yijing Wang wrote:
>> On 2014/9/25 7:30, Bjorn Helgaas wrote:
>>> On Wed, Sep 24, 2014 at 4:41 PM, Keith Busch <keith.busch@intel.com> wrote:
>>>> Just poking this thread to make sure it's not dead. :)
>>>>
>>>> I tested Yijing's proposal and it is successful on our Intel server
>>>> platforms; hoping either this or something that derives similar behavior
>>>> will be applied so we can remove bus tuning kernel parameters.
>>>>
>>>> Tested-by: Keith Busch <keith.busch@intel.com>
>>>
>>> Oops, thanks for poking me, because this was indeed dead.
>>>
>>> My main objection was to testing "is_hotplug_bridge".  That doesn't
>>> seem right, because this issue really isn't specific to hotplug.  I
>>> didn't see a resolution of that, but let me know if I missed it.
>>
>> Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
>> in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.
>>
>> It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.
>>
>> I'd like to refactor current MPS framework, but now there are still some puzzles
>> to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
>> log from git, I found he turn off all this MPS config, because some issues were found
>> in some platforms, but no platforms detailed info and no bugzilla records.
> 
> Just my opinion, I thought the hotplug check was a good idea: it addresses
> a known issue, and does not mess with current unknowns. Outside a hotplug
> scenario, I think we expect platform f/w to handle MPS settings and the
> kernel can stay out of the way because of the unknown platform issues. If
> it is hotplug, having the kernel set device's MPS to match the parent
> couldn't make things worse off than doing nothing, right?

Yes, I think so, but it all decided by Bjorn. I think he want a better solution,
and current patch just still is a temporary fix.

> 
> On the side, I'll see if I can ping some comrades on PCI-SIG to propose
> an ECN to clarify configuring MPS. They usually ignore me though, so no
> promises. :)

Thanks in advance for your help :)

> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH] PCI: update device mps when doing pci hotplug
  2014-09-26  3:22                 ` Yijing Wang
@ 2014-10-02 15:31                   ` Jordan_Hargrave
  0 siblings, 0 replies; 42+ messages in thread
From: Jordan_Hargrave @ 2014-10-02 15:31 UTC (permalink / raw)
  To: wangyijing, keith.busch
  Cc: bhelgaas, linux-pci, jdmason, Jose_De_La_Rosa, Austin_Bolen

\My original proposed patch to handle MPS only lived in the hotplug code path, it wasn't part of the initial PCI configuration.
This is the code that we have had to insert into our drivers (MTIP32XX, NVME) as the current default setting doesn't work for hotplug.

http://www.spinics.net/lists/hotplug/msg04728.html

--jordan hargrave
Dell Enterprise Linux Engineering
________________________________________
From: Yijing Wang [wangyijing@huawei.com]
Sent: Thursday, September 25, 2014 10:22 PM
To: Keith Busch
Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; Hargrave, Jordan; Jon Mason
Subject: Re: [PATCH] PCI: update device mps when doing pci hotplug

On 2014/9/26 0:46, Keith Busch wrote:
> On Wed, 24 Sep 2014, Yijing Wang wrote:
>> On 2014/9/25 7:30, Bjorn Helgaas wrote:
>>> On Wed, Sep 24, 2014 at 4:41 PM, Keith Busch <keith.busch@intel.com> wrote:
>>>> Just poking this thread to make sure it's not dead. :)
>>>>
>>>> I tested Yijing's proposal and it is successful on our Intel server
>>>> platforms; hoping either this or something that derives similar behavior
>>>> will be applied so we can remove bus tuning kernel parameters.
>>>>
>>>> Tested-by: Keith Busch <keith.busch@intel.com>
>>>
>>> Oops, thanks for poking me, because this was indeed dead.
>>>
>>> My main objection was to testing "is_hotplug_bridge".  That doesn't
>>> seem right, because this issue really isn't specific to hotplug.  I
>>> didn't see a resolution of that, but let me know if I missed it.
>>
>> Why I introduced "is_hotplug_bridge" is to avoid to touch the MPS which is not
>> in hotplug case when pcie_bus_config == PCIE_BUS_TUNE_OFF.
>>
>> It's so sad that PCIe spec doesn't give a detailed guide to configure MPS.
>>
>> I'd like to refactor current MPS framework, but now there are still some puzzles
>> to me. I need to have a deeper understanding of pcie mps. I read Jon's mps patch
>> log from git, I found he turn off all this MPS config, because some issues were found
>> in some platforms, but no platforms detailed info and no bugzilla records.
>
> Just my opinion, I thought the hotplug check was a good idea: it addresses
> a known issue, and does not mess with current unknowns. Outside a hotplug
> scenario, I think we expect platform f/w to handle MPS settings and the
> kernel can stay out of the way because of the unknown platform issues. If
> it is hotplug, having the kernel set device's MPS to match the parent
> couldn't make things worse off than doing nothing, right?

Yes, I think so, but it all decided by Bjorn. I think he want a better solution,
and current patch just still is a temporary fix.

>
> On the side, I'll see if I can ping some comrades on PCI-SIG to propose
> an ECN to clarify configuring MPS. They usually ignore me though, so no
> promises. :)

Thanks in advance for your help :)

>
> .
>


--
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] PCI: update device mps when doing pci hotplug
@ 2014-07-29  8:23 Yijing Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Yijing Wang @ 2014-07-29  8:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Jordan_Hargrave, keith.busch, jdmason, Yijing Wang

Currently we don't update device's mps value when doing
pci device hot-add. The hot-added device's mps will be set
to default value (128B). But the upstream port device's mps
may be larger than 128B which was set by firmware during
system bootup. In this case the new added device may not
work normally. This issue was found in huawei 5885 server
and Dell R620 server. And if we run the platform with windows,
this problem is gone. This patch try to update the hot added
device mps equal to its parent mps, if device mpss < parent mps,
print warning.

References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
Reported-by: Keith Busch <keith.busch@intel.com>
Reported-by: Jordan_Hargrave@Dell.com
Reported-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
---
 drivers/pci/probe.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e3cf8a2..583ca52 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1613,6 +1613,44 @@ static void pcie_write_mrrs(struct pci_dev *dev)
 		dev_err(&dev->dev, "MRRS was unable to be configured with a safe value.  If problems are experienced, try running with pci=pcie_bus_safe\n");
 }
 
+/**
+ * pcie_bus_update_set - update device mps when device doing hot-add
+ * @dev: PCI device to set
+ * 
+ * After device hot add, mps will be set to default(128B), But the 
+ * upstream port device's mps may be larger than 128B which was set 
+ * by firmware during system bootup. Then we should update the device
+ * mps to equal to its parent mps, Or the device can not work normally.
+ */
+static void pcie_bus_update_set(struct pci_dev *dev)
+{
+	int mps, p_mps, mpss;
+	struct pci_dev *parent;
+
+	if (!pci_is_pcie(dev) || !dev->bus->self 
+			|| !dev->bus->self->is_hotplug_bridge)
+		return;
+	
+	parent = dev->bus->self;
+	mps = pcie_get_mps(dev);
+	p_mps = pcie_get_mps(parent);
+
+	if (mps >= p_mps)
+		return;
+
+	mpss = 128 << dev->pcie_mpss;
+	if (mpss < p_mps) {
+		dev_warn(&dev->dev, "MPSS %d smaller than upstream MPS %d\n"
+				"If necessary, use \"pci=pcie_bus_safe\" boot parameter to avoid this problem\n",
+				mpss, p_mps);
+		return;
+	}
+
+	pcie_write_mps(dev, p_mps);
+	dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n", 
+			pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
+}
+
 static void pcie_bus_detect_mps(struct pci_dev *dev)
 {
 	struct pci_dev *bridge = dev->bus->self;
@@ -1637,6 +1675,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
 		return 0;
 
 	if (pcie_bus_config == PCIE_BUS_TUNE_OFF) {
+		pcie_bus_update_set(dev);
 		pcie_bus_detect_mps(dev);
 		return 0;
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-07-31 20:42               ` Bjorn Helgaas
@ 2013-08-01  1:23                 ` Yijing Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Yijing Wang @ 2013-08-01  1:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jon Mason, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	Jiang Liu, Joe Jin

>> Yes, I think that would be safe.  If the switch is set to a larger MPS
>> than the hot-added device supports, I don't think we can safely use
>> the device.
> 
> I opened a bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> for this problem.  Please correct any mistakes in my summary and
> reference it in your changelog.

OK, thanks!

> 
> Bjorn
> 
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-07-31 17:53             ` Bjorn Helgaas
  2013-07-31 20:42               ` Bjorn Helgaas
@ 2013-08-01  1:21               ` Yijing Wang
  1 sibling, 0 replies; 42+ messages in thread
From: Yijing Wang @ 2013-08-01  1:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jon Mason, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	Jiang Liu, Joe Jin

> Yes, I think that would be safe.  If the switch is set to a larger MPS
> than the hot-added device supports, I don't think we can safely use
> the device.

OK, I will refresh my patch, after test in my machine, I will send it out.

Thanks!
Yijing.

> 
> Bjorn
> 
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-07-31 17:53             ` Bjorn Helgaas
@ 2013-07-31 20:42               ` Bjorn Helgaas
  2013-08-01  1:23                 ` Yijing Wang
  2013-08-01  1:21               ` Yijing Wang
  1 sibling, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2013-07-31 20:42 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Jon Mason, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	Jiang Liu, Joe Jin

On Wed, Jul 31, 2013 at 11:53 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Wed, Jul 31, 2013 at 3:15 AM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Hi Bjorn,
>>    I didn't observe a performance difference between MPS=128 and MPS=512. I use ping $dest_ip -s 65500(large size packet)
>> to test the different situations.
>
> Interesting.  "ping" is probably not a good way to see performance
> differences, but hopefully you could see a difference in *some*
> scenario.  Otherwise, there's not much point in increasing MPS :)
>
>>> I assume there are no AER or other errors logged by the root port?
>> Yes, AER is not support in local machine.
>
> Per the 5520/5500 spec, it does support AER (sec 19.11.5).  Maybe
> there's some platform support required in addition.  You might still
> be able to see some info just with "lspci -vv"
>
>> Hmmm, PCIe Spec does not involve too much about MPS setting. So maybe different platform
>> has different strategy.
>
> I think there's enough in the spec to tell us what we need to do (this
> is sec 2.2.2):
>
>   - A Transmitter must not send a TLP larger than its Max_Payload_Size
>   - A Receiver must treat TLPs larger than its Max_Payload_Size as malformed
>
> The only way I can see to guarantee that is to set the MPS on both
> ends of the link the same.
>
>> Conservatively, as a improvement for mps setting after hotplug. I think update mps setting equal to its parent
>> make sense. This is no harm to other devices, we only modify the hotplug device itself mps register.
>>
>> So if you agree, I will update my patch ,only try to modify hotplug device mps, make them equal to its parent.
>
> Yes, I think that would be safe.  If the switch is set to a larger MPS
> than the hot-added device supports, I don't think we can safely use
> the device.

I opened a bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=60671
for this problem.  Please correct any mistakes in my summary and
reference it in your changelog.

Bjorn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-07-31  9:15           ` Yijing Wang
@ 2013-07-31 17:53             ` Bjorn Helgaas
  2013-07-31 20:42               ` Bjorn Helgaas
  2013-08-01  1:21               ` Yijing Wang
  0 siblings, 2 replies; 42+ messages in thread
From: Bjorn Helgaas @ 2013-07-31 17:53 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Jon Mason, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	Jiang Liu, Joe Jin

On Wed, Jul 31, 2013 at 3:15 AM, Yijing Wang <wangyijing@huawei.com> wrote:
> Hi Bjorn,
>    I didn't observe a performance difference between MPS=128 and MPS=512. I use ping $dest_ip -s 65500(large size packet)
> to test the different situations.

Interesting.  "ping" is probably not a good way to see performance
differences, but hopefully you could see a difference in *some*
scenario.  Otherwise, there's not much point in increasing MPS :)

>> I assume there are no AER or other errors logged by the root port?
> Yes, AER is not support in local machine.

Per the 5520/5500 spec, it does support AER (sec 19.11.5).  Maybe
there's some platform support required in addition.  You might still
be able to see some info just with "lspci -vv"

> Hmmm, PCIe Spec does not involve too much about MPS setting. So maybe different platform
> has different strategy.

I think there's enough in the spec to tell us what we need to do (this
is sec 2.2.2):

  - A Transmitter must not send a TLP larger than its Max_Payload_Size
  - A Receiver must treat TLPs larger than its Max_Payload_Size as malformed

The only way I can see to guarantee that is to set the MPS on both
ends of the link the same.

> Conservatively, as a improvement for mps setting after hotplug. I think update mps setting equal to its parent
> make sense. This is no harm to other devices, we only modify the hotplug device itself mps register.
>
> So if you agree, I will update my patch ,only try to modify hotplug device mps, make them equal to its parent.

Yes, I think that would be safe.  If the switch is set to a larger MPS
than the hot-added device supports, I don't think we can safely use
the device.

Bjorn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-07-30 22:29         ` Bjorn Helgaas
@ 2013-07-31  9:15           ` Yijing Wang
  2013-07-31 17:53             ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2013-07-31  9:15 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jon Mason, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	Jiang Liu, Joe Jin

>>> PCIe Spec does not explicitly mention this issue, we can only get the message that
>>> root port/ root complex can split the TLP into smaller packets. For instance
>>> one 256B packet split into two 128B packet.
>>>
>>> I confirm this issue in my X86 machine and IA64 machine.
>>> 1. I unload NIC driver to make sure the safety during  change the NIC MPS.
>>> 2. Use setpci change NIC MPS to the max value it supports.
>>> 3. Reload the NIC driver
>>> 4. Ping and use scp cpoy large file bwtween machines. Result is ok.
> 
> Just as a way to confirm that the MPS change is actually doing
> something, I assume you observe a performance difference between
> MPS=128 and MPS=512 on the NIC (and the root port MPS=128 in both
> cases)?  Or maybe you can confirm with an analyzer that there are
> actually 512-byte TLPs on the link?

Hi Bjorn,
   I didn't observe a performance difference between MPS=128 and MPS=512. I use ping $dest_ip -s 65500(large size packet)
to test the different situations.

1. root port MPS = 128, EP MPS = 256.

root port --------Endpoint device
00:01.0           01:00.1

In this case, I use ping in the local machine, and result is ok.
linux:~ # ping 128.5.160.28 -s 65500
PING 128.5.160.28 (128.5.160.28) 65500(65528) bytes of data.
65508 bytes from 128.5.160.28: icmp_seq=1 ttl=64 time=1.43 ms
65508 bytes from 128.5.160.28: icmp_seq=2 ttl=64 time=1.42 ms
65508 bytes from 128.5.160.28: icmp_seq=3 ttl=64 time=1.41 ms
65508 bytes from 128.5.160.28: icmp_seq=4 ttl=64 time=1.37 ms
65508 bytes from 128.5.160.28: icmp_seq=5 ttl=64 time=1.43 ms
..........

 \-[0000:00]-+-00.0  Intel Corporation 5500 I/O Hub to ESI Port
             +-01.0-[01]--+-00.0  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
             |            \-00.1  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet

linux:~ # lspci -vvv -s 01:00.1
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
............[snip].............
	Capabilities: [ac] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes

linux:~ # lspci -vvv -s 00:01.0
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22) (prog-if 00 [Normal decode])
...........[snip]..............
	Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes


2. root port MPS = 256, EP MPS = 128.
In this case, use "ping $dest_ip -s 65500" to test, but result is fail.

So I guess the packet size during ping is larger than 128, EP device discard these TLPs.

I have no analyzer to catch the TLP packets. So I can not Guarantee this conclusion(EP MPS larger than Root port is 100% safe).

> 
> I assume there are no AER or other errors logged by the root port?
Yes, AER is not support in local machine.

> The test you showed was a copy *to* the local machine, so the NIC
> would have been doing DMA writes to memory.  I assume it works equally
> well doing a copy *from* the local machine to another machine across
> the network, where the NIC is doing DMA reads from memory?

Yes, I tested in both copy direction, and result is ok.

> The only mention I can find in the spec is sec 1.3.1, where it says "a
> Root Complex is generally permitted to split a packet into smaller
> packets when routing transactions peer-to-peer between hierarchy
> domains ..."
> 
> I'm not a hardware guy (I often wish I were :)), but here's how I
> interpret that statement.  Let's take the following example:
> 
>   00:01.0 Root port bridge to [bus 01] MPS=128
>   01:00.1 Endpoint MPS=512
> 
>   00:02.0 Root Port bridge to [bus 02] MPS=256
>   00:03.0 Root Port bridge to [bus 03] MPS=128
>   02:00.0 Endpoint MPS=256
>   03:00.0 Endpoint MPS=128
> 
> If 02:00.0 (MPS=256) generates a DMA write destined for 03:00.0, it
> may transmit a TLP with a data payload of 256 bytes, and 00:02.0
> (MPS=256 also) will accept it.  The root complex may route the packet
> to 00:03.0 (MPS=128), and here it would need to be split into two
> 128-byte TLPs before being transmitted by 00:03.0 to 03:00.0
> (MPS=128).
> 
> Your situation is basically 01:00.1 (MPS=512) doing a DMA write
> destined for memory and sending a 512-byte TLP to 00:01.0 (MPS=128).
> In this case, the root complex isn't doing any peer-to-peer routing
> between hierarchy domains, so I don't think the statement in sec 1.3.1
> applies.  So I don't understand why the root port would accept that
> TLP.  I would think it would report a malformed TLP error.

Hmmm, PCIe Spec does not involve too much about MPS setting. So maybe different platform
has different strategy.

Conservatively, as a improvement for mps setting after hotplug. I think update mps setting equal to its parent
make sense. This is no harm to other devices, we only modify the hotplug device itself mps register.

So if you agree, I will update my patch ,only try to modify hotplug device mps, make them equal to its parent.

Thanks!
Yijing.



-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-07-30  3:42       ` Bjorn Helgaas
@ 2013-07-30 22:29         ` Bjorn Helgaas
  2013-07-31  9:15           ` Yijing Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2013-07-30 22:29 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Jon Mason, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	Jiang Liu, Joe Jin

On Mon, Jul 29, 2013 at 9:42 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Jul 29, 2013 at 9:20 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> On 2013/7/30 7:33, Bjorn Helgaas wrote:
>>> On Mon, May 27, 2013 at 9:15 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>> Hi Bjorn and Jon,
>>>>    I'm sorry to disturb you. This patch is sent so long, but nobody seems had comment about it.
>>>> Do you have any comment with this patch?
>>>>
>>>> This patch try to update device mps in following case:
>>>> 1) target device under root port
>>>>    Because root port can split TLP, so target device mps greatr than root port mps is ok.
>>>>    But if root port mps greater than target device mps, it's bad, because target device cannot
>>>>    receive TLP payload size greater than its MPS. So if a target device under a root port, I think
>>>>    we should assign its mps greater than or equal root port mps.
>>>> 2) target device under non root port
>>>>    We assume the target device both is a transmitter and receiver, so the safest way is to assign target
>>>>    device mps equal to its parent device.
>>>
>>> Thanks, I just started reviewing this patch, and your notes above are
>>> exactly the question I was going to ask.  The comments in
>>> pcie_bus_update_set() only tell me what the code does.  I can read the
>>> C code just fine; what we need there is the explanation about *why* we
>>> handle devices below root ports differently than others.  Maybe we can
>>> adapt some of your notes as comments in the code.
>>
>> Hi Bjorn,
>>    Thanks for your review and comments!
>>
>>>
>>> Do you have references to the spec where it talks about this
>>> difference?  I want to make sure we can rely on the fact that a root
>>> port can accept TLPs larger than its MPS.
>>
>> PCIe Spec does not explicitly mention this issue, we can only get the message that
>> root port/ root complex can split the TLP into smaller packets. For instance
>> one 256B packet split into two 128B packet.
>>
>> I confirm this issue in my X86 machine and IA64 machine.
>> 1. I unload NIC driver to make sure the safety during  change the NIC MPS.
>> 2. Use setpci change NIC MPS to the max value it supports.
>> 3. Reload the NIC driver
>> 4. Ping and use scp cpoy large file bwtween machines. Result is ok.

Just as a way to confirm that the MPS change is actually doing
something, I assume you observe a performance difference between
MPS=128 and MPS=512 on the NIC (and the root port MPS=128 in both
cases)?  Or maybe you can confirm with an analyzer that there are
actually 512-byte TLPs on the link?

I assume there are no AER or other errors logged by the root port?
The test you showed was a copy *to* the local machine, so the NIC
would have been doing DMA writes to memory.  I assume it works equally
well doing a copy *from* the local machine to another machine across
the network, where the NIC is doing DMA reads from memory?

> The fact that it works on two pieces of hardware is not enough to be
> confident that it will work on all spec-conforming hardware.  Maybe we
> can deduce this from something in the spec, but I'll have to dig into
> it more tomorrow.  I just hoped that you had a spec reference that
> could save me some time.

The only mention I can find in the spec is sec 1.3.1, where it says "a
Root Complex is generally permitted to split a packet into smaller
packets when routing transactions peer-to-peer between hierarchy
domains ..."

I'm not a hardware guy (I often wish I were :)), but here's how I
interpret that statement.  Let's take the following example:

  00:01.0 Root port bridge to [bus 01] MPS=128
  01:00.1 Endpoint MPS=512

  00:02.0 Root Port bridge to [bus 02] MPS=256
  00:03.0 Root Port bridge to [bus 03] MPS=128
  02:00.0 Endpoint MPS=256
  03:00.0 Endpoint MPS=128

If 02:00.0 (MPS=256) generates a DMA write destined for 03:00.0, it
may transmit a TLP with a data payload of 256 bytes, and 00:02.0
(MPS=256 also) will accept it.  The root complex may route the packet
to 00:03.0 (MPS=128), and here it would need to be split into two
128-byte TLPs before being transmitted by 00:03.0 to 03:00.0
(MPS=128).

Your situation is basically 01:00.1 (MPS=512) doing a DMA write
destined for memory and sending a 512-byte TLP to 00:01.0 (MPS=128).
In this case, the root complex isn't doing any peer-to-peer routing
between hierarchy domains, so I don't think the statement in sec 1.3.1
applies.  So I don't understand why the root port would accept that
TLP.  I would think it would report a malformed TLP error.

Bjorn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-07-30  3:20     ` Yijing Wang
@ 2013-07-30  3:42       ` Bjorn Helgaas
  2013-07-30 22:29         ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2013-07-30  3:42 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Jon Mason, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	Jiang Liu, Joe Jin

On Mon, Jul 29, 2013 at 9:20 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> On 2013/7/30 7:33, Bjorn Helgaas wrote:
>> On Mon, May 27, 2013 at 9:15 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> Hi Bjorn and Jon,
>>>    I'm sorry to disturb you. This patch is sent so long, but nobody seems had comment about it.
>>> Do you have any comment with this patch?
>>>
>>> This patch try to update device mps in following case:
>>> 1) target device under root port
>>>    Because root port can split TLP, so target device mps greatr than root port mps is ok.
>>>    But if root port mps greater than target device mps, it's bad, because target device cannot
>>>    receive TLP payload size greater than its MPS. So if a target device under a root port, I think
>>>    we should assign its mps greater than or equal root port mps.
>>> 2) target device under non root port
>>>    We assume the target device both is a transmitter and receiver, so the safest way is to assign target
>>>    device mps equal to its parent device.
>>
>> Thanks, I just started reviewing this patch, and your notes above are
>> exactly the question I was going to ask.  The comments in
>> pcie_bus_update_set() only tell me what the code does.  I can read the
>> C code just fine; what we need there is the explanation about *why* we
>> handle devices below root ports differently than others.  Maybe we can
>> adapt some of your notes as comments in the code.
>
> Hi Bjorn,
>    Thanks for your review and comments!
>
>>
>> Do you have references to the spec where it talks about this
>> difference?  I want to make sure we can rely on the fact that a root
>> port can accept TLPs larger than its MPS.
>
> PCIe Spec does not explicitly mention this issue, we can only get the message that
> root port/ root complex can split the TLP into smaller packets. For instance
> one 256B packet split into two 128B packet.
>
> I confirm this issue in my X86 machine and IA64 machine.
> 1. I unload NIC driver to make sure the safety during  change the NIC MPS.
> 2. Use setpci change NIC MPS to the max value it supports.
> 3. Reload the NIC driver
> 4. Ping and use scp cpoy large file bwtween machines. Result is ok.

The fact that it works on two pieces of hardware is not enough to be
confident that it will work on all spec-conforming hardware.  Maybe we
can deduce this from something in the spec, but I'll have to dig into
it more tomorrow.  I just hoped that you had a spec reference that
could save me some time.

> linux:/home/yijing # lspci -tv
>  \-[0000:00]-+-00.0  Intel Corporation 5500 I/O Hub to ESI Port
>              +-01.0-[01]--+-00.0  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
>              |            \-00.1  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
>              +-03.0-[02]----00.0  Xilinx Corporation Default PCIe endpoint ID
>              +-07.0-[03]--+-00.0  Intel Corporation 82576 Gigabit Network Connection
>              |            \-00.1  Intel Corporation 82576 Gigabit Network Connection
>              +-09.0-[04]----00.0  LSI Logic / Symbios Logic MegaRAID SAS 1078
>                 ................
>
> linux:/home/yijing # ifconfig
> eth1      Link encap:Ethernet  HWaddr 80:FB:06:AD:B2:FF
>           inet addr:128.5.160.31  Bcast:128.5.160.255  Mask:255.255.255.0
>           inet6 addr: fe80::82fb:6ff:fead:b2ff/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:2737201 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:2665883 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:3681912141 (3511.3 Mb)  TX bytes:3672206941 (3502.0 Mb)
>
> linux:/home/yijing # ethtool -i eth1
> driver: bnx2
> version: 2.2.3
> firmware-version: bc 4.6.4
> bus-info: 0000:01:00.1  ------------->device
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
>
> linux:/home/yijing # lspci -vvv -s 0000:00:01.0
> 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22) (prog-if 00 [Normal decode])
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 256 bytes
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 0000f000-00000fff
>         Memory behind bridge: c0000000-c3ffffff
>         Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
>         BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [40] Subsystem: Device 19e5:2008
>         Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit-
>                 Address: 00000000  Data: 0000
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                         ExtTag+ RBE+ FLReset-
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 128 bytes, MaxReadReq 128 bytes         --------------------------->root port device, MPS is 128B
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>                 LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <512ns, L1 <64us
>                         ClockPM- Surprise+ LLActRep+ BwNot+
>                 .........[snip].......
>
>
> linux:/home/yijing # lspci -vvv -s 01:00.1        ----------------->EP device, MPS change from 128B to 512B
> 01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>         Subsystem: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 256 bytes
>         Interrupt: pin B routed to IRQ 40
>         Region 0: Memory at c2000000 (64-bit, non-prefetchable) [size=32M]
>         Capabilities: [48] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [50] Vital Product Data
>                 Product Name: Broadcom NetXtreme II Ethernet Controller
>                 Read-only fields:
>                         [PN] Part number: BCM95706A0
>                         [EC] Engineering changes: 220197-2
>                         [SN] Serial number: 0123456789
>                         [MN] Manufacture ID: 31 34 65 34
>                         [RV] Reserved: checksum good, 31 byte(s) reserved
>                 End
>         Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
>                 Address: 0000000000000000  Data: 0000
>         Capabilities: [a0] MSI-X: Enable+ Count=9 Masked-
>                 Vector table: BAR=0 offset=0000c000
>                 PBA: BAR=0 offset=0000e000
>         Capabilities: [ac] Express (v2) Endpoint, MSI 00
>                 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                         MaxPayload 512 bytes, MaxReadReq 512 bytes  ---------------------------->EP device, MPS is 512B
>                 DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
>                 LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L0 <2us, L1 <2us
>                         ClockPM- Surprise- LLActRep- BwNot-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>
>                 ...............[snip].............
>
> linux:/home/yijing # scp yijing@128.5.160.28:/home/yijing/ISO/HUAWEI_Enterprise_Linux_B016.iso ./
> yijing@128.5.160.28's password:
> HUAWEI_Enterprise_Linux_B016.iso                                                                                                   100% 3318MB  53.5MB/s   01:02
>
>
> linux:/home/yijing # ping 128.5.64.144 -l 65530
> WARNING: probably, rcvbuf is not enough to hold preload.
> PING 128.5.64.144 (128.5.64.144) 56(84) bytes of data.
> 64 bytes from 128.5.64.144: icmp_seq=1 ttl=126 time=9.12 ms
> 64 bytes from 128.5.64.144: icmp_seq=2 ttl=126 time=9.11 ms
> 64 bytes from 128.5.64.144: icmp_seq=3 ttl=126 time=10.0 ms
> 64 bytes from 128.5.64.144: icmp_seq=4 ttl=126 time=10.0 ms
> 64 bytes from 128.5.64.144: icmp_seq=5 ttl=126 time=10.0 ms
> 64 bytes from 128.5.64.144: icmp_seq=6 ttl=126 time=10.1 ms
> 64 bytes from 128.5.64.144: icmp_seq=7 ttl=126 time=7.66 ms
> 64 bytes from 128.5.64.144: icmp_seq=8 ttl=126 time=7.94 ms
> 64 bytes from 128.5.64.144: icmp_seq=9 ttl=126 time=59.3 ms
> 64 bytes from 128.5.64.144: icmp_seq=10 ttl=126 time=7.97 ms
> 64 bytes from 128.5.64.144: icmp_seq=11 ttl=126 time=9.68 ms
> 64 bytes from 128.5.64.144: icmp_seq=12 ttl=126 time=8.21 ms
> 64 bytes from 128.5.64.144: icmp_seq=13 ttl=126 time=7.95 ms
> 64 bytes from 128.5.64.144: icmp_seq=14 ttl=126 time=8.04 ms
> 64 bytes from 128.5.64.144: icmp_seq=15 ttl=126 time=7.77 ms
>
>
>
>
>
>
>
>
>
>
>
>
>
>>
>> Bjorn
>>
>>> On 2013/2/5 11:55, Yijing Wang wrote:
>>>> Currently we dont't update device's mps vaule when doing
>>>> pci device hot-add. The hot-added device's mps will be set
>>>> to default value (128B). But the upstream port device's mps
>>>> may be larger than 128B which was set by firmware during
>>>> system bootup. In this case the new added device may not
>>>> work normally.
>>>>
>>>> The reference discussion at
>>>> http://marc.info/?l=linux-pci&m=135420434508910&w=2
>>>> and
>>>> http://marc.info/?l=linux-pci&m=134815603407842&w=2
>>>>
>>>> Reported-by: Joe Jin <joe.jin@oracle.com>
>>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>> Cc: Jon Mason <jdmason@kudzu.us>
>>>> ---
>>>>  drivers/pci/probe.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 files changed, 49 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>> index bbe4be7..57d9a5b 100644
>>>> --- a/drivers/pci/probe.c
>>>> +++ b/drivers/pci/probe.c
>>>> @@ -1556,6 +1556,52 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>>       return 0;
>>>>  }
>>>>
>>>> +static int pcie_bus_update_set(struct pci_dev *dev, void *data)
>>>> +{
>>>> +     int mps, p_mps;
>>>> +
>>>> +     if (!pci_is_pcie(dev) || !dev->bus->self)
>>>> +             return 0;
>>>> +
>>>> +     mps = pcie_get_mps(dev);
>>>> +     p_mps = pcie_get_mps(dev->bus->self);
>>>> +
>>>> +     if (pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT) {
>>>> +             /* update mps when current device mps is not equal to upstream mps */
>>>> +             if (mps != p_mps)
>>>> +                     goto update;
>>>> +     } else {
>>>> +             /* update mps when current device mps is smaller than upstream mps */
>>>> +             if (mps < p_mps)
>>>> +                     goto update;
>>>> +     }
>>>> +
>>>> +     return 0;
>>>> +
>>>> +update:
>>>> +     /* If current mpss is lager than upstream, use upstream mps to update
>>>> +      * current mps, otherwise print warning info.
>>>> +      */
>>>> +     if ((128 << dev->pcie_mpss) >= p_mps)
>>>> +             pcie_write_mps(dev, p_mps);
>>>> +     else
>>>> +             dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
>>>> +                             "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
>>>> +                             mps, 128 << dev->pcie_mpss, p_mps);
>>>> +     return 0;
>>>> +}
>>>> +
>>>> +static void pcie_bus_update_setting(struct pci_bus *bus)
>>>> +{
>>>> +
>>>> +     /*
>>>> +      * After hot added a pci device, the device's mps will set to default
>>>> +      * vaule(128 bytes). But the upstream port mps may be larger than 128B.
>>>> +      * In this case, we should update this device's mps for better performance.
>>>> +      */
>>>> +     pci_walk_bus(bus, pcie_bus_update_set, NULL);
>>>> +}
>>>> +
>>>>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>>>>   * parents then children fashion.  If this changes, then this code will not
>>>>   * work as designed.
>>>> @@ -1566,6 +1612,9 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
>>>>
>>>>       if (!pci_is_pcie(bus->self))
>>>>               return;
>>>> +
>>>> +     /* update mps setting for newly hot added device */
>>>> +     pcie_bus_update_setting(bus);
>>>>
>>>>       if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
>>>>               return;
>>>>
>>>
>>>
>>> --
>>> Thanks!
>>> Yijing
>>>
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-07-29 23:33   ` Bjorn Helgaas
@ 2013-07-30  3:20     ` Yijing Wang
  2013-07-30  3:42       ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2013-07-30  3:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jon Mason, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	Jiang Liu, Joe Jin

On 2013/7/30 7:33, Bjorn Helgaas wrote:
> On Mon, May 27, 2013 at 9:15 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Hi Bjorn and Jon,
>>    I'm sorry to disturb you. This patch is sent so long, but nobody seems had comment about it.
>> Do you have any comment with this patch?
>>
>> This patch try to update device mps in following case:
>> 1) target device under root port
>>    Because root port can split TLP, so target device mps greatr than root port mps is ok.
>>    But if root port mps greater than target device mps, it's bad, because target device cannot
>>    receive TLP payload size greater than its MPS. So if a target device under a root port, I think
>>    we should assign its mps greater than or equal root port mps.
>> 2) target device under non root port
>>    We assume the target device both is a transmitter and receiver, so the safest way is to assign target
>>    device mps equal to its parent device.
> 
> Thanks, I just started reviewing this patch, and your notes above are
> exactly the question I was going to ask.  The comments in
> pcie_bus_update_set() only tell me what the code does.  I can read the
> C code just fine; what we need there is the explanation about *why* we
> handle devices below root ports differently than others.  Maybe we can
> adapt some of your notes as comments in the code.

Hi Bjorn,
   Thanks for your review and comments!

> 
> Do you have references to the spec where it talks about this
> difference?  I want to make sure we can rely on the fact that a root
> port can accept TLPs larger than its MPS.

PCIe Spec does not explicitly mention this issue, we can only get the message that
root port/ root complex can split the TLP into smaller packets. For instance
one 256B packet split into two 128B packet.

I confirm this issue in my X86 machine and IA64 machine.
1. I unload NIC driver to make sure the safety during  change the NIC MPS.
2. Use setpci change NIC MPS to the max value it supports.
3. Reload the NIC driver
4. Ping and use scp cpoy large file bwtween machines. Result is ok.

linux:/home/yijing # lspci -tv
 \-[0000:00]-+-00.0  Intel Corporation 5500 I/O Hub to ESI Port
             +-01.0-[01]--+-00.0  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
             |            \-00.1  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
             +-03.0-[02]----00.0  Xilinx Corporation Default PCIe endpoint ID
             +-07.0-[03]--+-00.0  Intel Corporation 82576 Gigabit Network Connection
             |            \-00.1  Intel Corporation 82576 Gigabit Network Connection
             +-09.0-[04]----00.0  LSI Logic / Symbios Logic MegaRAID SAS 1078
		................

linux:/home/yijing # ifconfig
eth1      Link encap:Ethernet  HWaddr 80:FB:06:AD:B2:FF
          inet addr:128.5.160.31  Bcast:128.5.160.255  Mask:255.255.255.0
          inet6 addr: fe80::82fb:6ff:fead:b2ff/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2737201 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2665883 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3681912141 (3511.3 Mb)  TX bytes:3672206941 (3502.0 Mb)

linux:/home/yijing # ethtool -i eth1
driver: bnx2
version: 2.2.3
firmware-version: bc 4.6.4
bus-info: 0000:01:00.1  ------------->device
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

linux:/home/yijing # lspci -vvv -s 0000:00:01.0
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22) (prog-if 00 [Normal decode])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 256 bytes
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: c0000000-c3ffffff
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Subsystem: Device 19e5:2008
	Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit-
		Address: 00000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes         --------------------------->root port device, MPS is 128B
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <512ns, L1 <64us
			ClockPM- Surprise+ LLActRep+ BwNot+
		.........[snip].......


linux:/home/yijing # lspci -vvv -s 01:00.1        ----------------->EP device, MPS change from 128B to 512B
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
	Subsystem: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 256 bytes
	Interrupt: pin B routed to IRQ 40
	Region 0: Memory at c2000000 (64-bit, non-prefetchable) [size=32M]
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
		Product Name: Broadcom NetXtreme II Ethernet Controller
		Read-only fields:
			[PN] Part number: BCM95706A0
			[EC] Engineering changes: 220197-2
			[SN] Serial number: 0123456789
			[MN] Manufacture ID: 31 34 65 34
			[RV] Reserved: checksum good, 31 byte(s) reserved
		End
	Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [a0] MSI-X: Enable+ Count=9 Masked-
		Vector table: BAR=0 offset=0000c000
		PBA: BAR=0 offset=0000e000
	Capabilities: [ac] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 512 bytes, MaxReadReq 512 bytes  ---------------------------->EP device, MPS is 512B
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L0 <2us, L1 <2us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

 		...............[snip].............

linux:/home/yijing # scp yijing@128.5.160.28:/home/yijing/ISO/HUAWEI_Enterprise_Linux_B016.iso ./
yijing@128.5.160.28's password:
HUAWEI_Enterprise_Linux_B016.iso                                                                                                   100% 3318MB  53.5MB/s   01:02


linux:/home/yijing # ping 128.5.64.144 -l 65530
WARNING: probably, rcvbuf is not enough to hold preload.
PING 128.5.64.144 (128.5.64.144) 56(84) bytes of data.
64 bytes from 128.5.64.144: icmp_seq=1 ttl=126 time=9.12 ms
64 bytes from 128.5.64.144: icmp_seq=2 ttl=126 time=9.11 ms
64 bytes from 128.5.64.144: icmp_seq=3 ttl=126 time=10.0 ms
64 bytes from 128.5.64.144: icmp_seq=4 ttl=126 time=10.0 ms
64 bytes from 128.5.64.144: icmp_seq=5 ttl=126 time=10.0 ms
64 bytes from 128.5.64.144: icmp_seq=6 ttl=126 time=10.1 ms
64 bytes from 128.5.64.144: icmp_seq=7 ttl=126 time=7.66 ms
64 bytes from 128.5.64.144: icmp_seq=8 ttl=126 time=7.94 ms
64 bytes from 128.5.64.144: icmp_seq=9 ttl=126 time=59.3 ms
64 bytes from 128.5.64.144: icmp_seq=10 ttl=126 time=7.97 ms
64 bytes from 128.5.64.144: icmp_seq=11 ttl=126 time=9.68 ms
64 bytes from 128.5.64.144: icmp_seq=12 ttl=126 time=8.21 ms
64 bytes from 128.5.64.144: icmp_seq=13 ttl=126 time=7.95 ms
64 bytes from 128.5.64.144: icmp_seq=14 ttl=126 time=8.04 ms
64 bytes from 128.5.64.144: icmp_seq=15 ttl=126 time=7.77 ms













> 
> Bjorn
> 
>> On 2013/2/5 11:55, Yijing Wang wrote:
>>> Currently we dont't update device's mps vaule when doing
>>> pci device hot-add. The hot-added device's mps will be set
>>> to default value (128B). But the upstream port device's mps
>>> may be larger than 128B which was set by firmware during
>>> system bootup. In this case the new added device may not
>>> work normally.
>>>
>>> The reference discussion at
>>> http://marc.info/?l=linux-pci&m=135420434508910&w=2
>>> and
>>> http://marc.info/?l=linux-pci&m=134815603407842&w=2
>>>
>>> Reported-by: Joe Jin <joe.jin@oracle.com>
>>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>> Cc: Jon Mason <jdmason@kudzu.us>
>>> ---
>>>  drivers/pci/probe.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 files changed, 49 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index bbe4be7..57d9a5b 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -1556,6 +1556,52 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>>       return 0;
>>>  }
>>>
>>> +static int pcie_bus_update_set(struct pci_dev *dev, void *data)
>>> +{
>>> +     int mps, p_mps;
>>> +
>>> +     if (!pci_is_pcie(dev) || !dev->bus->self)
>>> +             return 0;
>>> +
>>> +     mps = pcie_get_mps(dev);
>>> +     p_mps = pcie_get_mps(dev->bus->self);
>>> +
>>> +     if (pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT) {
>>> +             /* update mps when current device mps is not equal to upstream mps */
>>> +             if (mps != p_mps)
>>> +                     goto update;
>>> +     } else {
>>> +             /* update mps when current device mps is smaller than upstream mps */
>>> +             if (mps < p_mps)
>>> +                     goto update;
>>> +     }
>>> +
>>> +     return 0;
>>> +
>>> +update:
>>> +     /* If current mpss is lager than upstream, use upstream mps to update
>>> +      * current mps, otherwise print warning info.
>>> +      */
>>> +     if ((128 << dev->pcie_mpss) >= p_mps)
>>> +             pcie_write_mps(dev, p_mps);
>>> +     else
>>> +             dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
>>> +                             "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
>>> +                             mps, 128 << dev->pcie_mpss, p_mps);
>>> +     return 0;
>>> +}
>>> +
>>> +static void pcie_bus_update_setting(struct pci_bus *bus)
>>> +{
>>> +
>>> +     /*
>>> +      * After hot added a pci device, the device's mps will set to default
>>> +      * vaule(128 bytes). But the upstream port mps may be larger than 128B.
>>> +      * In this case, we should update this device's mps for better performance.
>>> +      */
>>> +     pci_walk_bus(bus, pcie_bus_update_set, NULL);
>>> +}
>>> +
>>>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>>>   * parents then children fashion.  If this changes, then this code will not
>>>   * work as designed.
>>> @@ -1566,6 +1612,9 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
>>>
>>>       if (!pci_is_pcie(bus->self))
>>>               return;
>>> +
>>> +     /* update mps setting for newly hot added device */
>>> +     pcie_bus_update_setting(bus);
>>>
>>>       if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
>>>               return;
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-05-28  3:15 ` Yijing Wang
@ 2013-07-29 23:33   ` Bjorn Helgaas
  2013-07-30  3:20     ` Yijing Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2013-07-29 23:33 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Jon Mason, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	Jiang Liu, Joe Jin

On Mon, May 27, 2013 at 9:15 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> Hi Bjorn and Jon,
>    I'm sorry to disturb you. This patch is sent so long, but nobody seems had comment about it.
> Do you have any comment with this patch?
>
> This patch try to update device mps in following case:
> 1) target device under root port
>    Because root port can split TLP, so target device mps greatr than root port mps is ok.
>    But if root port mps greater than target device mps, it's bad, because target device cannot
>    receive TLP payload size greater than its MPS. So if a target device under a root port, I think
>    we should assign its mps greater than or equal root port mps.
> 2) target device under non root port
>    We assume the target device both is a transmitter and receiver, so the safest way is to assign target
>    device mps equal to its parent device.

Thanks, I just started reviewing this patch, and your notes above are
exactly the question I was going to ask.  The comments in
pcie_bus_update_set() only tell me what the code does.  I can read the
C code just fine; what we need there is the explanation about *why* we
handle devices below root ports differently than others.  Maybe we can
adapt some of your notes as comments in the code.

Do you have references to the spec where it talks about this
difference?  I want to make sure we can rely on the fact that a root
port can accept TLPs larger than its MPS.

Bjorn

> On 2013/2/5 11:55, Yijing Wang wrote:
>> Currently we dont't update device's mps vaule when doing
>> pci device hot-add. The hot-added device's mps will be set
>> to default value (128B). But the upstream port device's mps
>> may be larger than 128B which was set by firmware during
>> system bootup. In this case the new added device may not
>> work normally.
>>
>> The reference discussion at
>> http://marc.info/?l=linux-pci&m=135420434508910&w=2
>> and
>> http://marc.info/?l=linux-pci&m=134815603407842&w=2
>>
>> Reported-by: Joe Jin <joe.jin@oracle.com>
>> Reported-by: Yijing Wang <wangyijing@huawei.com>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Cc: Jon Mason <jdmason@kudzu.us>
>> ---
>>  drivers/pci/probe.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 49 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index bbe4be7..57d9a5b 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1556,6 +1556,52 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>>       return 0;
>>  }
>>
>> +static int pcie_bus_update_set(struct pci_dev *dev, void *data)
>> +{
>> +     int mps, p_mps;
>> +
>> +     if (!pci_is_pcie(dev) || !dev->bus->self)
>> +             return 0;
>> +
>> +     mps = pcie_get_mps(dev);
>> +     p_mps = pcie_get_mps(dev->bus->self);
>> +
>> +     if (pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT) {
>> +             /* update mps when current device mps is not equal to upstream mps */
>> +             if (mps != p_mps)
>> +                     goto update;
>> +     } else {
>> +             /* update mps when current device mps is smaller than upstream mps */
>> +             if (mps < p_mps)
>> +                     goto update;
>> +     }
>> +
>> +     return 0;
>> +
>> +update:
>> +     /* If current mpss is lager than upstream, use upstream mps to update
>> +      * current mps, otherwise print warning info.
>> +      */
>> +     if ((128 << dev->pcie_mpss) >= p_mps)
>> +             pcie_write_mps(dev, p_mps);
>> +     else
>> +             dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
>> +                             "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
>> +                             mps, 128 << dev->pcie_mpss, p_mps);
>> +     return 0;
>> +}
>> +
>> +static void pcie_bus_update_setting(struct pci_bus *bus)
>> +{
>> +
>> +     /*
>> +      * After hot added a pci device, the device's mps will set to default
>> +      * vaule(128 bytes). But the upstream port mps may be larger than 128B.
>> +      * In this case, we should update this device's mps for better performance.
>> +      */
>> +     pci_walk_bus(bus, pcie_bus_update_set, NULL);
>> +}
>> +
>>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>>   * parents then children fashion.  If this changes, then this code will not
>>   * work as designed.
>> @@ -1566,6 +1612,9 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
>>
>>       if (!pci_is_pcie(bus->self))
>>               return;
>> +
>> +     /* update mps setting for newly hot added device */
>> +     pcie_bus_update_setting(bus);
>>
>>       if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
>>               return;
>>
>
>
> --
> Thanks!
> Yijing
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] PCI: update device mps when doing pci hotplug
  2013-02-05  3:55 Yijing Wang
@ 2013-05-28  3:15 ` Yijing Wang
  2013-07-29 23:33   ` Bjorn Helgaas
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2013-05-28  3:15 UTC (permalink / raw)
  To: Bjorn Helgaas, Jon Mason
  Cc: Yijing Wang, linux-kernel, linux-pci, Yijing Wang, Hanjun Guo,
	jiang.liu, Joe Jin

Hi Bjorn and Jon,
   I'm sorry to disturb you. This patch is sent so long, but nobody seems had comment about it.
Do you have any comment with this patch?

This patch try to update device mps in following case:
1) target device under root port
   Because root port can split TLP, so target device mps greatr than root port mps is ok.
   But if root port mps greater than target device mps, it's bad, because target device cannot
   receive TLP payload size greater than its MPS. So if a target device under a root port, I think
   we should assign its mps greater than or equal root port mps.
2) target device under non root port
   We assume the target device both is a transmitter and receiver, so the safest way is to assign target
   device mps equal to its parent device.

Any comments about this patch is welcome!

Thanks!
Yijing.

On 2013/2/5 11:55, Yijing Wang wrote:
> Currently we dont't update device's mps vaule when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally.
> 
> The reference discussion at
> http://marc.info/?l=linux-pci&m=135420434508910&w=2
> and
> http://marc.info/?l=linux-pci&m=134815603407842&w=2
> 
> Reported-by: Joe Jin <joe.jin@oracle.com>
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> ---
>  drivers/pci/probe.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 49 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index bbe4be7..57d9a5b 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1556,6 +1556,52 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>  	return 0;
>  }
>  
> +static int pcie_bus_update_set(struct pci_dev *dev, void *data)
> +{
> +	int mps, p_mps;
> +
> +	if (!pci_is_pcie(dev) || !dev->bus->self)
> +		return 0;
> +
> +	mps = pcie_get_mps(dev);
> +	p_mps = pcie_get_mps(dev->bus->self);
> +
> +	if (pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT) {
> +		/* update mps when current device mps is not equal to upstream mps */
> +		if (mps != p_mps)
> +			goto update;
> +	} else {
> +		/* update mps when current device mps is smaller than upstream mps */
> +		if (mps < p_mps)
> +			goto update;
> +	}
> +
> +	return 0;
> +
> +update:
> +	/* If current mpss is lager than upstream, use upstream mps to update
> +	 * current mps, otherwise print warning info.
> +	 */
> +	if ((128 << dev->pcie_mpss) >= p_mps)
> +		pcie_write_mps(dev, p_mps);
> +	else
> +		dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
> +				"If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
> +				mps, 128 << dev->pcie_mpss, p_mps);
> +	return 0;
> +}
> +
> +static void pcie_bus_update_setting(struct pci_bus *bus)
> +{
> +
> +	/*
> +	 * After hot added a pci device, the device's mps will set to default
> +	 * vaule(128 bytes). But the upstream port mps may be larger than 128B.
> +	 * In this case, we should update this device's mps for better performance.
> +	 */
> +	pci_walk_bus(bus, pcie_bus_update_set, NULL);
> +}
> +
>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>   * parents then children fashion.  If this changes, then this code will not
>   * work as designed.
> @@ -1566,6 +1612,9 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
>  
>  	if (!pci_is_pcie(bus->self))
>  		return;
> +
> +	/* update mps setting for newly hot added device */
> +	pcie_bus_update_setting(bus);
>  
>  	if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
>  		return;
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] PCI: update device mps when doing pci hotplug
@ 2013-02-05  3:55 Yijing Wang
  2013-05-28  3:15 ` Yijing Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Yijing Wang @ 2013-02-05  3:55 UTC (permalink / raw)
  To: Bjorn Helgaas, Jon Mason
  Cc: linux-kernel, linux-pci, Yijing Wang, Hanjun Guo, jiang.liu,
	Joe Jin, Yijing Wang

Currently we dont't update device's mps vaule when doing
pci device hot-add. The hot-added device's mps will be set
to default value (128B). But the upstream port device's mps
may be larger than 128B which was set by firmware during
system bootup. In this case the new added device may not
work normally.

The reference discussion at
http://marc.info/?l=linux-pci&m=135420434508910&w=2
and
http://marc.info/?l=linux-pci&m=134815603407842&w=2

Reported-by: Joe Jin <joe.jin@oracle.com>
Reported-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
---
 drivers/pci/probe.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bbe4be7..57d9a5b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1556,6 +1556,52 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
 	return 0;
 }
 
+static int pcie_bus_update_set(struct pci_dev *dev, void *data)
+{
+	int mps, p_mps;
+
+	if (!pci_is_pcie(dev) || !dev->bus->self)
+		return 0;
+
+	mps = pcie_get_mps(dev);
+	p_mps = pcie_get_mps(dev->bus->self);
+
+	if (pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT) {
+		/* update mps when current device mps is not equal to upstream mps */
+		if (mps != p_mps)
+			goto update;
+	} else {
+		/* update mps when current device mps is smaller than upstream mps */
+		if (mps < p_mps)
+			goto update;
+	}
+
+	return 0;
+
+update:
+	/* If current mpss is lager than upstream, use upstream mps to update
+	 * current mps, otherwise print warning info.
+	 */
+	if ((128 << dev->pcie_mpss) >= p_mps)
+		pcie_write_mps(dev, p_mps);
+	else
+		dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
+				"If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
+				mps, 128 << dev->pcie_mpss, p_mps);
+	return 0;
+}
+
+static void pcie_bus_update_setting(struct pci_bus *bus)
+{
+
+	/*
+	 * After hot added a pci device, the device's mps will set to default
+	 * vaule(128 bytes). But the upstream port mps may be larger than 128B.
+	 * In this case, we should update this device's mps for better performance.
+	 */
+	pci_walk_bus(bus, pcie_bus_update_set, NULL);
+}
+
 /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
  * parents then children fashion.  If this changes, then this code will not
  * work as designed.
@@ -1566,6 +1612,9 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
 
 	if (!pci_is_pcie(bus->self))
 		return;
+
+	/* update mps setting for newly hot added device */
+	pcie_bus_update_setting(bus);
 
 	if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
 		return;
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2014-10-02 15:45 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-29  8:17 [PATCH] PCI: update device mps when doing pci hotplug Yijing Wang
2014-07-29 16:18 ` Alex Williamson
2014-07-29 16:30   ` Keith Busch
2014-07-29 16:42     ` Alex Williamson
2014-07-29 19:04       ` Keith Busch
2014-07-30  3:35       ` Yijing Wang
2014-07-30  3:27   ` Yijing Wang
2014-07-30  3:33 ` Ethan Zhao
2014-07-30  3:42   ` Yijing Wang
2014-07-30  3:58     ` Ethan Zhao
2014-07-30  4:42       ` Yijing Wang
2014-07-30  6:26 ` Ethan Zhao
2014-07-30  6:57   ` Yijing Wang
2014-07-30  7:17     ` Ethan Zhao
2014-07-30  8:13       ` Yijing Wang
2014-07-30  8:38         ` Ethan Zhao
2014-07-30  9:17           ` Yijing Wang
2014-07-30 19:41             ` Jordan_Hargrave
2014-09-03 19:20               ` Bjorn Helgaas
2014-09-03 22:42 ` Bjorn Helgaas
2014-09-04  6:12   ` Yijing Wang
2014-09-04 13:16     ` Bjorn Helgaas
2014-09-05  1:27       ` Yijing Wang
2014-09-05 14:37         ` Keith Busch
2014-09-24 22:41         ` Keith Busch
2014-09-24 23:30           ` Bjorn Helgaas
2014-09-25  1:23             ` Yijing Wang
2014-09-25 16:46               ` Keith Busch
2014-09-26  3:22                 ` Yijing Wang
2014-10-02 15:31                   ` Jordan_Hargrave
  -- strict thread matches above, loose matches on Subject: below --
2014-07-29  8:23 Yijing Wang
2013-02-05  3:55 Yijing Wang
2013-05-28  3:15 ` Yijing Wang
2013-07-29 23:33   ` Bjorn Helgaas
2013-07-30  3:20     ` Yijing Wang
2013-07-30  3:42       ` Bjorn Helgaas
2013-07-30 22:29         ` Bjorn Helgaas
2013-07-31  9:15           ` Yijing Wang
2013-07-31 17:53             ` Bjorn Helgaas
2013-07-31 20:42               ` Bjorn Helgaas
2013-08-01  1:23                 ` Yijing Wang
2013-08-01  1:21               ` Yijing Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).