All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-05  7:15 ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm
  Cc: Ding Tianhong

Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
    some platform which has some issues could use the new flag to indicate
    that it is not safe to enable relaxed ordering attribute, then we need
    to clear the relaxed ordering enable bits in the PCI configuration when
    initializing the device. So add a new second patch to modify the PCI
    initialization code to clear the relaxed ordering enable bit in the
    event that the root complex doesn't want relaxed ordering enabled.

    The third patch was base on the v1's second patch and only be changed
    to query the relaxed ordering enable bit in the PCI configuration space
    to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
    set.

    This version didn't plan to drop the defines for Intel Drivers to use the
    new checking way to enable relaxed ordering because it is not the hardest
    part of the moment, we could fix it in next patchset when this patches
    reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
    If a PCIe device didn't enable the relaxed ordering attribute default,
    we should not do anything in the PCIe configuration, otherwise we
    should check if any of the devices above us do not support relaxed
    ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
    the result if we get a return that indicate that the relaxed ordering
    is not supported we should update our device to disable relaxed ordering
    in configuration space. If the device above us doesn't exist or isn't
    the PCIe device, we shouldn't do anything and skip updating relaxed ordering
    because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and pcie_disable_relaxed_ordering
    according John's suggestion, and modify the description, use the true/false
    as the return value.

    We shouldn't enable relaxed ordering attribute by the setting in the root
    complex configuration space for PCIe device, so fix it for cxgb4.

    Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
    value, and add the check for VF device.

    Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
    Ordering Enable] in PCI Probe() routine, this will break our current
    solution for some platform which has problematic when enable the relaxed
    ordering attribute. According to the latest recommendations, remove the
    enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
    scene, but we agree to leave this problem until we really trigger it.

    Make this patch set base on 4.12 release version.

v8: Change the second patch title and description to make it more reasonable,
    add the acked-by from Alex and Ashok.

    Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.

    Make this patch set base on 4.13-rc2.

v9: The document (https://software.intel.com/sites/default/files/managed/9e/
    bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
    processors based on Broadwell/Haswell microarchitecture has the problem
    with Relaxed Ordering Attribute enabled, so add the whole list Device ID
    from Intel to the patch.

Casey Leedom (3):
  PCI: Add new PCIe Fabric End Node flag,
    PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (1):
  PCI: Disable PCIe Relaxed Ordering if unsupported

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 23 +++++++++----
 drivers/net/ethernet/chelsio/cxgb4/sge.c           |  5 +--
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h     |  1 +
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c    | 18 ++++++++++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c         |  3 ++
 drivers/pci/pci.c                                  | 29 +++++++++++++++++
 drivers/pci/probe.c                                | 37 +++++++++++++++++++++
 drivers/pci/quirks.c                               | 88 ++++++++++++++++++++++
 include/linux/pci.h                                |  4 +++
 10 files changed, 151 insertions(+), 8 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-05  7:15 ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: linux-arm-kernel

Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
    some platform which has some issues could use the new flag to indicate
    that it is not safe to enable relaxed ordering attribute, then we need
    to clear the relaxed ordering enable bits in the PCI configuration when
    initializing the device. So add a new second patch to modify the PCI
    initialization code to clear the relaxed ordering enable bit in the
    event that the root complex doesn't want relaxed ordering enabled.

    The third patch was base on the v1's second patch and only be changed
    to query the relaxed ordering enable bit in the PCI configuration space
    to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
    set.

    This version didn't plan to drop the defines for Intel Drivers to use the
    new checking way to enable relaxed ordering because it is not the hardest
    part of the moment, we could fix it in next patchset when this patches
    reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
    If a PCIe device didn't enable the relaxed ordering attribute default,
    we should not do anything in the PCIe configuration, otherwise we
    should check if any of the devices above us do not support relaxed
    ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
    the result if we get a return that indicate that the relaxed ordering
    is not supported we should update our device to disable relaxed ordering
    in configuration space. If the device above us doesn't exist or isn't
    the PCIe device, we shouldn't do anything and skip updating relaxed ordering
    because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and pcie_disable_relaxed_ordering
    according John's suggestion, and modify the description, use the true/false
    as the return value.

    We shouldn't enable relaxed ordering attribute by the setting in the root
    complex configuration space for PCIe device, so fix it for cxgb4.

    Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
    value, and add the check for VF device.

    Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
    Ordering Enable] in PCI Probe() routine, this will break our current
    solution for some platform which has problematic when enable the relaxed
    ordering attribute. According to the latest recommendations, remove the
    enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
    scene, but we agree to leave this problem until we really trigger it.

    Make this patch set base on 4.12 release version.

v8: Change the second patch title and description to make it more reasonable,
    add the acked-by from Alex and Ashok.

    Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.

    Make this patch set base on 4.13-rc2.

v9: The document (https://software.intel.com/sites/default/files/managed/9e/
    bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
    processors based on Broadwell/Haswell microarchitecture has the problem
    with Relaxed Ordering Attribute enabled, so add the whole list Device ID
    from Intel to the patch.

Casey Leedom (3):
  PCI: Add new PCIe Fabric End Node flag,
    PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (1):
  PCI: Disable PCIe Relaxed Ordering if unsupported

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 23 +++++++++----
 drivers/net/ethernet/chelsio/cxgb4/sge.c           |  5 +--
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h     |  1 +
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c    | 18 ++++++++++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c         |  3 ++
 drivers/pci/pci.c                                  | 29 +++++++++++++++++
 drivers/pci/probe.c                                | 37 +++++++++++++++++++++
 drivers/pci/quirks.c                               | 88 ++++++++++++++++++++++
 include/linux/pci.h                                |  4 +++
 10 files changed, 151 insertions(+), 8 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-05  7:15 ` Ding Tianhong
  (?)
@ 2017-08-05  7:15   ` Ding Tianhong
  -1 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm
  Cc: Ding Tianhong

From: Casey Leedom <leedom@chelsio.com>

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targetted towards these affected root complexes. Current list
of affected parts include some Intel Xeon processors root complex which suffers from
flow control credits that result in performance issues. On these affected
parts RO can still be used for peer-2-peer traffic. AMD A1100 ARM ("SEATTLE")
Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to
data-corruption.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/pci/quirks.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h  |  2 ++
 2 files changed, 90 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..5c9e125 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4016,6 +4016,94 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
 			      quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel Xeon processors based on Broadwell/Haswell microarchitecture Root
+ * Complex has a Flow Control Credit issue which can cause performance
+ * problems with Upstream Transaction Layer Packets with Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f03, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f05, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f06, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f07, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f09, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0a, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0b, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0c, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0d, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0e, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f03, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f05, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f06, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f07, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f09, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0a, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0b, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0c, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0d, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0e, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..412ec1c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,8 @@ enum pci_dev_flags {
 	 * the direct_complete optimization.
 	 */
 	PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
+	/* Don't use Relaxed Ordering for TLPs directed at this device */
+	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-05  7:15   ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm
  Cc: Ding Tianhong

From: Casey Leedom <leedom@chelsio.com>

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targetted towards these affected root complexes. Current list
of affected parts include some Intel Xeon processors root complex which suffers from
flow control credits that result in performance issues. On these affected
parts RO can still be used for peer-2-peer traffic. AMD A1100 ARM ("SEATTLE")
Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to
data-corruption.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/pci/quirks.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h  |  2 ++
 2 files changed, 90 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..5c9e125 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4016,6 +4016,94 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
 			      quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel Xeon processors based on Broadwell/Haswell microarchitecture Root
+ * Complex has a Flow Control Credit issue which can cause performance
+ * problems with Upstream Transaction Layer Packets with Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f03, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f05, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f06, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f07, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f09, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0a, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0b, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0c, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0d, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0e, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f03, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f05, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f06, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f07, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f09, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0a, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0b, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0c, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0d, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0e, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..412ec1c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,8 @@ enum pci_dev_flags {
 	 * the direct_complete optimization.
 	 */
 	PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
+	/* Don't use Relaxed Ordering for TLPs directed at this device */
+	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
-- 
1.8.3.1



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-05  7:15   ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: linux-arm-kernel

From: Casey Leedom <leedom@chelsio.com>

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targetted towards these affected root complexes. Current list
of affected parts include some Intel Xeon processors root complex which suffers from
flow control credits that result in performance issues. On these affected
parts RO can still be used for peer-2-peer traffic. AMD A1100 ARM ("SEATTLE")
Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to
data-corruption.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/pci/quirks.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h  |  2 ++
 2 files changed, 90 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6967c6b..5c9e125 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4016,6 +4016,94 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
 			      quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel Xeon processors based on Broadwell/Haswell microarchitecture Root
+ * Complex has a Flow Control Credit issue which can cause performance
+ * problems with Upstream Transaction Layer Packets with Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f03, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f05, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f06, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f07, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f09, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0a, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0b, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0c, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0d, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f0e, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f03, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f05, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f06, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f07, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f09, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0a, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0b, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0c, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0d, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x2f0e, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4869e66..412ec1c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -188,6 +188,8 @@ enum pci_dev_flags {
 	 * the direct_complete optimization.
 	 */
 	PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
+	/* Don't use Relaxed Ordering for TLPs directed@this device */
+	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
  2017-08-05  7:15 ` Ding Tianhong
  (?)
@ 2017-08-05  7:15   ` Ding Tianhong
  -1 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm
  Cc: Ding Tianhong

When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the eapability for this device. And if the
device is probably running in a guest machine, we should do nothing.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
 drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  2 ++
 3 files changed, 68 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..4f9d7c1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
 EXPORT_SYMBOL(pcie_set_mps);
 
 /**
+ * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+					  PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
+ * @dev: PCI device to query
+ *
+ * Returns true if the device support relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
+{
+	u16 v;
+
+	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
+
+	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
+
+/**
  * pcie_get_minimum_link - determine minimum link settings of a PCI device
  * @dev: PCI device to query
  * @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..48df012 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
 					 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pci_dev_should_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */
+static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
+{
+	while (dev) {
+		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
+			return true;
+
+		dev = dev->bus->self;
+	}
+
+	return false;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+	/* We should not alter the relaxed ordering bit for the VF */
+	if (dev->is_virtfn)
+		return;
+
+	/* If the releaxed ordering enable bit is not set, do nothing. */
+	if (!pcie_relaxed_ordering_supported(dev))
+		return;
+
+	if (pci_dev_should_disable_relaxed_ordering(dev)) {
+		pcie_clear_relaxed_ordering(dev);
+		dev_info(&dev->dev, "Disable Relaxed Ordering\n");
+	}
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
 	struct hotplug_params hpp;
@@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
 	pci_configure_mps(dev);
 	pci_configure_extended_tags(dev);
+	pci_configure_relaxed_ordering(dev);
 
 	memset(&hpp, 0, sizeof(hpp));
 	ret = pci_get_hp_params(dev, &hpp);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 412ec1c..3aa23a2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-05  7:15   ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm
  Cc: Ding Tianhong

When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the eapability for this device. And if the
device is probably running in a guest machine, we should do nothing.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
 drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  2 ++
 3 files changed, 68 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..4f9d7c1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
 EXPORT_SYMBOL(pcie_set_mps);
 
 /**
+ * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+					  PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
+ * @dev: PCI device to query
+ *
+ * Returns true if the device support relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
+{
+	u16 v;
+
+	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
+
+	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
+
+/**
  * pcie_get_minimum_link - determine minimum link settings of a PCI device
  * @dev: PCI device to query
  * @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..48df012 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
 					 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pci_dev_should_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */
+static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
+{
+	while (dev) {
+		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
+			return true;
+
+		dev = dev->bus->self;
+	}
+
+	return false;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+	/* We should not alter the relaxed ordering bit for the VF */
+	if (dev->is_virtfn)
+		return;
+
+	/* If the releaxed ordering enable bit is not set, do nothing. */
+	if (!pcie_relaxed_ordering_supported(dev))
+		return;
+
+	if (pci_dev_should_disable_relaxed_ordering(dev)) {
+		pcie_clear_relaxed_ordering(dev);
+		dev_info(&dev->dev, "Disable Relaxed Ordering\n");
+	}
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
 	struct hotplug_params hpp;
@@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
 	pci_configure_mps(dev);
 	pci_configure_extended_tags(dev);
+	pci_configure_relaxed_ordering(dev);
 
 	memset(&hpp, 0, sizeof(hpp));
 	ret = pci_get_hp_params(dev, &hpp);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 412ec1c..3aa23a2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-05  7:15   ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: linux-arm-kernel

When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the eapability for this device. And if the
device is probably running in a guest machine, we should do nothing.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
---
 drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
 drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  2 ++
 3 files changed, 68 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..4f9d7c1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
 EXPORT_SYMBOL(pcie_set_mps);
 
 /**
+ * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+					  PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
+ * @dev: PCI device to query
+ *
+ * Returns true if the device support relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
+{
+	u16 v;
+
+	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
+
+	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
+
+/**
  * pcie_get_minimum_link - determine minimum link settings of a PCI device
  * @dev: PCI device to query
  * @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..48df012 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
 					 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pci_dev_should_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */
+static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
+{
+	while (dev) {
+		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
+			return true;
+
+		dev = dev->bus->self;
+	}
+
+	return false;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+	/* We should not alter the relaxed ordering bit for the VF */
+	if (dev->is_virtfn)
+		return;
+
+	/* If the releaxed ordering enable bit is not set, do nothing. */
+	if (!pcie_relaxed_ordering_supported(dev))
+		return;
+
+	if (pci_dev_should_disable_relaxed_ordering(dev)) {
+		pcie_clear_relaxed_ordering(dev);
+		dev_info(&dev->dev, "Disable Relaxed Ordering\n");
+	}
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
 	struct hotplug_params hpp;
@@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
 	pci_configure_mps(dev);
 	pci_configure_extended_tags(dev);
+	pci_configure_relaxed_ordering(dev);
 
 	memset(&hpp, 0, sizeof(hpp));
 	ret = pci_get_hp_params(dev, &hpp);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 412ec1c..3aa23a2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 3/4] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  2017-08-05  7:15 ` Ding Tianhong
@ 2017-08-05  7:15   ` Ding Tianhong
  -1 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm
  Cc: Ding Tianhong

From: Casey Leedom <leedom@chelsio.com>

cxgb4 Ethernet driver now queries PCIe configuration space to determine
if it can send TLPs to it with the Relaxed Ordering Attribute set.

Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
Device Control[Relaxed Ordering Enable] at probe routine, to make sure
the driver will not send the Relaxed Ordering TLPs to the Root Complex which
could not deal the Relaxed Ordering TLPs.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +++++++++++++++++------
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |  5 +++--
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index ef4be78..09ea62e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -529,6 +529,7 @@ enum {                                 /* adapter flags */
 	USING_SOFT_PARAMS  = (1 << 6),
 	MASTER_PF          = (1 << 7),
 	FW_OFLD_CONN       = (1 << 9),
+	ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index e403fa1..391e484 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device *dev)
 		    dev->name, adap->params.vpd.id, adap->name, buf);
 }
 
-static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
-{
-	pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
-}
-
 /*
  * Free the following resources:
  * - memory used for tables
@@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	}
 
 	pci_enable_pcie_error_reporting(pdev);
-	enable_pcie_relaxed_ordering(pdev);
 	pci_set_master(pdev);
 	pci_save_state(pdev);
 
@@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	adapter->msg_enable = DFLT_MSG_ENABLE;
 	memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+	/* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+	 * Ingress Packet Data to Free List Buffers in order to allow for
+	 * chipset performance optimizations between the Root Complex and
+	 * Memory Controllers.  (Messages to the associated Ingress Queue
+	 * notifying new Packet Placement in the Free Lists Buffers will be
+	 * send without the Relaxed Ordering Attribute thus guaranteeing that
+	 * all preceding PCIe Transaction Layer Packets will be processed
+	 * first.)  But some Root Complexes have various issues with Upstream
+	 * Transaction Layer Packets with the Relaxed Ordering Attribute set.
+	 * The PCIe devices which under the Root Complexes will be cleared the
+	 * Relaxed Ordering bit in the configuration space, So we check our
+	 * PCIe configuration space to see if it's flagged with advice against
+	 * using Relaxed Ordering.
+	 */
+	if (!pcie_relaxed_ordering_supported(pdev))
+		adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
 	spin_lock_init(&adapter->stats_lock);
 	spin_lock_init(&adapter->tid_release_lock);
 	spin_lock_init(&adapter->win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index ede1220..4ef68f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 	struct fw_iq_cmd c;
 	struct sge *s = &adap->sge;
 	struct port_info *pi = netdev_priv(dev);
+	int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
 	/* Size needs to be multiple of 16, including status entry. */
 	iq->size = roundup(iq->size, 16);
@@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 
 		flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
 		c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-					     FW_IQ_CMD_FL0FETCHRO_F |
-					     FW_IQ_CMD_FL0DATARO_F |
+					     FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+					     FW_IQ_CMD_FL0DATARO_V(relaxed) |
 					     FW_IQ_CMD_FL0PADEN_F);
 		if (cong >= 0)
 			c.iqns_to_fl0congen |=
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 3/4] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-05  7:15   ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: linux-arm-kernel

From: Casey Leedom <leedom@chelsio.com>

cxgb4 Ethernet driver now queries PCIe configuration space to determine
if it can send TLPs to it with the Relaxed Ordering Attribute set.

Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
Device Control[Relaxed Ordering Enable] at probe routine, to make sure
the driver will not send the Relaxed Ordering TLPs to the Root Complex which
could not deal the Relaxed Ordering TLPs.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 23 +++++++++++++++++------
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |  5 +++--
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index ef4be78..09ea62e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -529,6 +529,7 @@ enum {                                 /* adapter flags */
 	USING_SOFT_PARAMS  = (1 << 6),
 	MASTER_PF          = (1 << 7),
 	FW_OFLD_CONN       = (1 << 9),
+	ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index e403fa1..391e484 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4654,11 +4654,6 @@ static void print_port_info(const struct net_device *dev)
 		    dev->name, adap->params.vpd.id, adap->name, buf);
 }
 
-static void enable_pcie_relaxed_ordering(struct pci_dev *dev)
-{
-	pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
-}
-
 /*
  * Free the following resources:
  * - memory used for tables
@@ -4908,7 +4903,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	}
 
 	pci_enable_pcie_error_reporting(pdev);
-	enable_pcie_relaxed_ordering(pdev);
 	pci_set_master(pdev);
 	pci_save_state(pdev);
 
@@ -4947,6 +4941,23 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	adapter->msg_enable = DFLT_MSG_ENABLE;
 	memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+	/* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+	 * Ingress Packet Data to Free List Buffers in order to allow for
+	 * chipset performance optimizations between the Root Complex and
+	 * Memory Controllers.  (Messages to the associated Ingress Queue
+	 * notifying new Packet Placement in the Free Lists Buffers will be
+	 * send without the Relaxed Ordering Attribute thus guaranteeing that
+	 * all preceding PCIe Transaction Layer Packets will be processed
+	 * first.)  But some Root Complexes have various issues with Upstream
+	 * Transaction Layer Packets with the Relaxed Ordering Attribute set.
+	 * The PCIe devices which under the Root Complexes will be cleared the
+	 * Relaxed Ordering bit in the configuration space, So we check our
+	 * PCIe configuration space to see if it's flagged with advice against
+	 * using Relaxed Ordering.
+	 */
+	if (!pcie_relaxed_ordering_supported(pdev))
+		adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
 	spin_lock_init(&adapter->stats_lock);
 	spin_lock_init(&adapter->tid_release_lock);
 	spin_lock_init(&adapter->win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index ede1220..4ef68f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2719,6 +2719,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 	struct fw_iq_cmd c;
 	struct sge *s = &adap->sge;
 	struct port_info *pi = netdev_priv(dev);
+	int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
 	/* Size needs to be multiple of 16, including status entry. */
 	iq->size = roundup(iq->size, 16);
@@ -2772,8 +2773,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 
 		flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
 		c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-					     FW_IQ_CMD_FL0FETCHRO_F |
-					     FW_IQ_CMD_FL0DATARO_F |
+					     FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+					     FW_IQ_CMD_FL0DATARO_V(relaxed) |
 					     FW_IQ_CMD_FL0PADEN_F);
 		if (cong >= 0)
 			c.iqns_to_fl0congen |=
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 4/4] net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  2017-08-05  7:15 ` Ding Tianhong
  (?)
@ 2017-08-05  7:15   ` Ding Tianhong
  -1 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm
  Cc: Ding Tianhong

From: Casey Leedom <leedom@chelsio.com>

cxgb4vf Ethernet driver now queries PCIe configuration space to
determine if it can send TLPs to it with the Relaxed Ordering
Attribute set, just like the pf did.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h      |  1 +
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 18 ++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c          |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 109bc63..08c6ddb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -408,6 +408,7 @@ enum { /* adapter flags */
 	USING_MSI          = (1UL << 1),
 	USING_MSIX         = (1UL << 2),
 	QUEUES_BOUND       = (1UL << 3),
+	ROOT_NO_RELAXED_ORDERING = (1UL << 4),
 };
 
 /*
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ac7a150..59e7639 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2888,6 +2888,24 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 	 */
 	adapter->name = pci_name(pdev);
 	adapter->msg_enable = DFLT_MSG_ENABLE;
+
+	/* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+	 * Ingress Packet Data to Free List Buffers in order to allow for
+	 * chipset performance optimizations between the Root Complex and
+	 * Memory Controllers.  (Messages to the associated Ingress Queue
+	 * notifying new Packet Placement in the Free Lists Buffers will be
+	 * send without the Relaxed Ordering Attribute thus guaranteeing that
+	 * all preceding PCIe Transaction Layer Packets will be processed
+	 * first.)  But some Root Complexes have various issues with Upstream
+	 * Transaction Layer Packets with the Relaxed Ordering Attribute set.
+	 * The PCIe devices which under the Root Complexes will be cleared the
+	 * Relaxed Ordering bit in the configuration space, So we check our
+	 * PCIe configuration space to see if it's flagged with advice against
+	 * using Relaxed Ordering.
+	 */
+	if (!pcie_relaxed_ordering_supported(pdev))
+		adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
 	err = adap_init0(adapter);
 	if (err)
 		goto err_unmap_bar;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index e37dde2..05498e7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -2205,6 +2205,7 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
 	struct port_info *pi = netdev_priv(dev);
 	struct fw_iq_cmd cmd, rpl;
 	int ret, iqandst, flsz = 0;
+	int relaxed = !(adapter->flags & ROOT_NO_RELAXED_ORDERING);
 
 	/*
 	 * If we're using MSI interrupts and we're not initializing the
@@ -2300,6 +2301,8 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
 			cpu_to_be32(
 				FW_IQ_CMD_FL0HOSTFCMODE_V(SGE_HOSTFCMODE_NONE) |
 				FW_IQ_CMD_FL0PACKEN_F |
+				FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+				FW_IQ_CMD_FL0DATARO_V(relaxed) |
 				FW_IQ_CMD_FL0PADEN_F);
 
 		/* In T6, for egress queue type FL there is internal overhead
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 4/4] net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-05  7:15   ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm
  Cc: Ding Tianhong

From: Casey Leedom <leedom@chelsio.com>

cxgb4vf Ethernet driver now queries PCIe configuration space to
determine if it can send TLPs to it with the Relaxed Ordering
Attribute set, just like the pf did.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h      |  1 +
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 18 ++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c          |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 109bc63..08c6ddb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -408,6 +408,7 @@ enum { /* adapter flags */
 	USING_MSI          = (1UL << 1),
 	USING_MSIX         = (1UL << 2),
 	QUEUES_BOUND       = (1UL << 3),
+	ROOT_NO_RELAXED_ORDERING = (1UL << 4),
 };
 
 /*
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ac7a150..59e7639 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2888,6 +2888,24 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 	 */
 	adapter->name = pci_name(pdev);
 	adapter->msg_enable = DFLT_MSG_ENABLE;
+
+	/* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+	 * Ingress Packet Data to Free List Buffers in order to allow for
+	 * chipset performance optimizations between the Root Complex and
+	 * Memory Controllers.  (Messages to the associated Ingress Queue
+	 * notifying new Packet Placement in the Free Lists Buffers will be
+	 * send without the Relaxed Ordering Attribute thus guaranteeing that
+	 * all preceding PCIe Transaction Layer Packets will be processed
+	 * first.)  But some Root Complexes have various issues with Upstream
+	 * Transaction Layer Packets with the Relaxed Ordering Attribute set.
+	 * The PCIe devices which under the Root Complexes will be cleared the
+	 * Relaxed Ordering bit in the configuration space, So we check our
+	 * PCIe configuration space to see if it's flagged with advice against
+	 * using Relaxed Ordering.
+	 */
+	if (!pcie_relaxed_ordering_supported(pdev))
+		adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
 	err = adap_init0(adapter);
 	if (err)
 		goto err_unmap_bar;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index e37dde2..05498e7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -2205,6 +2205,7 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
 	struct port_info *pi = netdev_priv(dev);
 	struct fw_iq_cmd cmd, rpl;
 	int ret, iqandst, flsz = 0;
+	int relaxed = !(adapter->flags & ROOT_NO_RELAXED_ORDERING);
 
 	/*
 	 * If we're using MSI interrupts and we're not initializing the
@@ -2300,6 +2301,8 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
 			cpu_to_be32(
 				FW_IQ_CMD_FL0HOSTFCMODE_V(SGE_HOSTFCMODE_NONE) |
 				FW_IQ_CMD_FL0PACKEN_F |
+				FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+				FW_IQ_CMD_FL0DATARO_V(relaxed) |
 				FW_IQ_CMD_FL0PADEN_F);
 
 		/* In T6, for egress queue type FL there is internal overhead
-- 
1.8.3.1



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH v9 4/4] net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-05  7:15   ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-05  7:15 UTC (permalink / raw)
  To: linux-arm-kernel

From: Casey Leedom <leedom@chelsio.com>

cxgb4vf Ethernet driver now queries PCIe configuration space to
determine if it can send TLPs to it with the Relaxed Ordering
Attribute set, just like the pf did.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h      |  1 +
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 18 ++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c          |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 109bc63..08c6ddb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -408,6 +408,7 @@ enum { /* adapter flags */
 	USING_MSI          = (1UL << 1),
 	USING_MSIX         = (1UL << 2),
 	QUEUES_BOUND       = (1UL << 3),
+	ROOT_NO_RELAXED_ORDERING = (1UL << 4),
 };
 
 /*
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ac7a150..59e7639 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2888,6 +2888,24 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 	 */
 	adapter->name = pci_name(pdev);
 	adapter->msg_enable = DFLT_MSG_ENABLE;
+
+	/* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+	 * Ingress Packet Data to Free List Buffers in order to allow for
+	 * chipset performance optimizations between the Root Complex and
+	 * Memory Controllers.  (Messages to the associated Ingress Queue
+	 * notifying new Packet Placement in the Free Lists Buffers will be
+	 * send without the Relaxed Ordering Attribute thus guaranteeing that
+	 * all preceding PCIe Transaction Layer Packets will be processed
+	 * first.)  But some Root Complexes have various issues with Upstream
+	 * Transaction Layer Packets with the Relaxed Ordering Attribute set.
+	 * The PCIe devices which under the Root Complexes will be cleared the
+	 * Relaxed Ordering bit in the configuration space, So we check our
+	 * PCIe configuration space to see if it's flagged with advice against
+	 * using Relaxed Ordering.
+	 */
+	if (!pcie_relaxed_ordering_supported(pdev))
+		adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
 	err = adap_init0(adapter);
 	if (err)
 		goto err_unmap_bar;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index e37dde2..05498e7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -2205,6 +2205,7 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
 	struct port_info *pi = netdev_priv(dev);
 	struct fw_iq_cmd cmd, rpl;
 	int ret, iqandst, flsz = 0;
+	int relaxed = !(adapter->flags & ROOT_NO_RELAXED_ORDERING);
 
 	/*
 	 * If we're using MSI interrupts and we're not initializing the
@@ -2300,6 +2301,8 @@ int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
 			cpu_to_be32(
 				FW_IQ_CMD_FL0HOSTFCMODE_V(SGE_HOSTFCMODE_NONE) |
 				FW_IQ_CMD_FL0PACKEN_F |
+				FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+				FW_IQ_CMD_FL0DATARO_V(relaxed) |
 				FW_IQ_CMD_FL0PADEN_F);
 
 		/* In T6, for egress queue type FL there is internal overhead
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  2017-08-05  7:15 ` Ding Tianhong
@ 2017-08-07  3:47   ` David Miller
  -1 siblings, 0 replies; 70+ messages in thread
From: David Miller @ 2017-08-07  3:47 UTC (permalink / raw)
  To: dingtianhong
  Cc: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Sat, 5 Aug 2017 15:15:09 +0800

> Some devices have problems with Transaction Layer Packets with the Relaxed
> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> devices with Relaxed Ordering issues, and a use of this new flag by the
> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> Ports.
> 
> It's been years since I've submitted kernel.org patches, I appolgise for the
> almost certain submission errors.

Which tree should merge this?  The PCI tree or my networking tree?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-07  3:47   ` David Miller
  0 siblings, 0 replies; 70+ messages in thread
From: David Miller @ 2017-08-07  3:47 UTC (permalink / raw)
  To: linux-arm-kernel

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Sat, 5 Aug 2017 15:15:09 +0800

> Some devices have problems with Transaction Layer Packets with the Relaxed
> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> devices with Relaxed Ordering issues, and a use of this new flag by the
> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> Ports.
> 
> It's been years since I've submitted kernel.org patches, I appolgise for the
> almost certain submission errors.

Which tree should merge this?  The PCI tree or my networking tree?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  2017-08-07  3:47   ` David Miller
  (?)
@ 2017-08-07  4:13     ` Ding Tianhong
  -1 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-07  4:13 UTC (permalink / raw)
  To: David Miller, bhelgaas, helgaas
  Cc: leedom, ashok.raj, werner, ganeshgr, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm



On 2017/8/7 11:47, David Miller wrote:
> From: Ding Tianhong <dingtianhong@huawei.com>
> Date: Sat, 5 Aug 2017 15:15:09 +0800
> 
>> Some devices have problems with Transaction Layer Packets with the Relaxed
>> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
>> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
>> devices with Relaxed Ordering issues, and a use of this new flag by the
>> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
>> Ports.
>>
>> It's been years since I've submitted kernel.org patches, I appolgise for the
>> almost certain submission errors.
> 
> Which tree should merge this?  The PCI tree or my networking tree?
> 

Hi David:

I think networking tree merge it is a better choice, as it mainly used to tell the NIC
drivers how to use the Relaxed Ordering Attribute, and later we need send patch to enable
RO for ixgbe driver base on this patch. But I am not sure whether Bjorn has some of his own
view. :)

Hi Bjorn:

Could you help review this patch or give some feedback ?

Thanks
Ding
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-07  4:13     ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-07  4:13 UTC (permalink / raw)
  To: David Miller, bhelgaas, helgaas
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom,
	patrick.j.cramer, werner, linux-arm-kernel, amira, netdev,
	linux-kernel, David.Laight, Suravee.Suthikulpanit, robin.murphy,
	l.stach

CgpPbiAyMDE3LzgvNyAxMTo0NywgRGF2aWQgTWlsbGVyIHdyb3RlOgo+IEZyb206IERpbmcgVGlh
bmhvbmcgPGRpbmd0aWFuaG9uZ0BodWF3ZWkuY29tPgo+IERhdGU6IFNhdCwgNSBBdWcgMjAxNyAx
NToxNTowOSArMDgwMAo+IAo+PiBTb21lIGRldmljZXMgaGF2ZSBwcm9ibGVtcyB3aXRoIFRyYW5z
YWN0aW9uIExheWVyIFBhY2tldHMgd2l0aCB0aGUgUmVsYXhlZAo+PiBPcmRlcmluZyBBdHRyaWJ1
dGUgc2V0LiAgVGhpcyBwYXRjaCBzZXQgYWRkcyBhIG5ldyBQQ0llIERldmljZSBGbGFnLAo+PiBQ
Q0lfREVWX0ZMQUdTX05PX1JFTEFYRURfT1JERVJJTkcsIGEgc2V0IG9mIFBDSSBRdWlya3MgdG8g
Y2F0Y2ggc29tZSBrbm93bgo+PiBkZXZpY2VzIHdpdGggUmVsYXhlZCBPcmRlcmluZyBpc3N1ZXMs
IGFuZCBhIHVzZSBvZiB0aGlzIG5ldyBmbGFnIGJ5IHRoZQo+PiBjeGdiNCBkcml2ZXIgdG8gYXZv
aWQgdXNpbmcgUmVsYXhlZCBPcmRlcmluZyB3aXRoIHByb2JsZW1hdGljIFJvb3QgQ29tcGxleAo+
PiBQb3J0cy4KPj4KPj4gSXQncyBiZWVuIHllYXJzIHNpbmNlIEkndmUgc3VibWl0dGVkIGtlcm5l
bC5vcmcgcGF0Y2hlcywgSSBhcHBvbGdpc2UgZm9yIHRoZQo+PiBhbG1vc3QgY2VydGFpbiBzdWJt
aXNzaW9uIGVycm9ycy4KPiAKPiBXaGljaCB0cmVlIHNob3VsZCBtZXJnZSB0aGlzPyAgVGhlIFBD
SSB0cmVlIG9yIG15IG5ldHdvcmtpbmcgdHJlZT8KPiAKCkhpIERhdmlk77yaCgpJIHRoaW5rIG5l
dHdvcmtpbmcgdHJlZSBtZXJnZSBpdCBpcyBhIGJldHRlciBjaG9pY2UsIGFzIGl0IG1haW5seSB1
c2VkIHRvIHRlbGwgdGhlIE5JQwpkcml2ZXJzIGhvdyB0byB1c2UgdGhlIFJlbGF4ZWQgT3JkZXJp
bmcgQXR0cmlidXRlLCBhbmQgbGF0ZXIgd2UgbmVlZCBzZW5kIHBhdGNoIHRvIGVuYWJsZQpSTyBm
b3IgaXhnYmUgZHJpdmVyIGJhc2Ugb24gdGhpcyBwYXRjaC4gQnV0IEkgYW0gbm90IHN1cmUgd2hl
dGhlciBCam9ybiBoYXMgc29tZSBvZiBoaXMgb3duCnZpZXcuIDopCgpIaSBCam9ybjoKCkNvdWxk
IHlvdSBoZWxwIHJldmlldyB0aGlzIHBhdGNoIG9yIGdpdmUgc29tZSBmZWVkYmFjayA/CgpUaGFu
a3MKRGluZwo+IC4KPiAKCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fXwpsaW51eC1hcm0ta2VybmVsIG1haWxpbmcgbGlzdApsaW51eC1hcm0ta2VybmVsQGxp
c3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0
aW5mby9saW51eC1hcm0ta2VybmVsCg==

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-07  4:13     ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-07  4:13 UTC (permalink / raw)
  To: linux-arm-kernel



On 2017/8/7 11:47, David Miller wrote:
> From: Ding Tianhong <dingtianhong@huawei.com>
> Date: Sat, 5 Aug 2017 15:15:09 +0800
> 
>> Some devices have problems with Transaction Layer Packets with the Relaxed
>> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
>> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
>> devices with Relaxed Ordering issues, and a use of this new flag by the
>> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
>> Ports.
>>
>> It's been years since I've submitted kernel.org patches, I appolgise for the
>> almost certain submission errors.
> 
> Which tree should merge this?  The PCI tree or my networking tree?
> 

Hi David?

I think networking tree merge it is a better choice, as it mainly used to tell the NIC
drivers how to use the Relaxed Ordering Attribute, and later we need send patch to enable
RO for ixgbe driver base on this patch. But I am not sure whether Bjorn has some of his own
view. :)

Hi Bjorn:

Could you help review this patch or give some feedback ?

Thanks
Ding
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  2017-08-07  4:13     ` Ding Tianhong
@ 2017-08-07 21:14       ` David Miller
  -1 siblings, 0 replies; 70+ messages in thread
From: David Miller @ 2017-08-07 21:14 UTC (permalink / raw)
  To: dingtianhong
  Cc: bhelgaas, helgaas, leedom, ashok.raj, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Mon, 7 Aug 2017 12:13:17 +0800

> Hi David:
> 
> I think networking tree merge it is a better choice, as it mainly used to tell the NIC
> drivers how to use the Relaxed Ordering Attribute, and later we need send patch to enable
> RO for ixgbe driver base on this patch. But I am not sure whether Bjorn has some of his own
> view. :)
> 
> Hi Bjorn:
> 
> Could you help review this patch or give some feedback ?

I'm still waiting on this...

Bjorn?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-07 21:14       ` David Miller
  0 siblings, 0 replies; 70+ messages in thread
From: David Miller @ 2017-08-07 21:14 UTC (permalink / raw)
  To: linux-arm-kernel

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Mon, 7 Aug 2017 12:13:17 +0800

> Hi David?
> 
> I think networking tree merge it is a better choice, as it mainly used to tell the NIC
> drivers how to use the Relaxed Ordering Attribute, and later we need send patch to enable
> RO for ixgbe driver base on this patch. But I am not sure whether Bjorn has some of his own
> view. :)
> 
> Hi Bjorn:
> 
> Could you help review this patch or give some feedback ?

I'm still waiting on this...

Bjorn?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  2017-08-07 21:14       ` David Miller
  (?)
@ 2017-08-08  1:56         ` Bjorn Helgaas
  -1 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-08  1:56 UTC (permalink / raw)
  To: David Miller
  Cc: dingtianhong, bhelgaas, leedom, ashok.raj, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm

On Mon, Aug 07, 2017 at 02:14:48PM -0700, David Miller wrote:
> From: Ding Tianhong <dingtianhong@huawei.com>
> Date: Mon, 7 Aug 2017 12:13:17 +0800
> 
> > Hi David:
> > 
> > I think networking tree merge it is a better choice, as it mainly used to tell the NIC
> > drivers how to use the Relaxed Ordering Attribute, and later we need send patch to enable
> > RO for ixgbe driver base on this patch. But I am not sure whether Bjorn has some of his own
> > view. :)
> > 
> > Hi Bjorn:
> > 
> > Could you help review this patch or give some feedback ?
> 
> I'm still waiting on this...
> 
> Bjorn?

I was on vacation Friday-today, but I'll look at this series this week.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-08  1:56         ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-08  1:56 UTC (permalink / raw)
  To: David Miller
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, dingtianhong, ganeshgr, Bob.Shaw, leedom,
	patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira,
	linux-pci, netdev, linux-kernel, David.Laight,
	Suravee.Suthikulpanit, robin.murphy, l.stach

T24gTW9uLCBBdWcgMDcsIDIwMTcgYXQgMDI6MTQ6NDhQTSAtMDcwMCwgRGF2aWQgTWlsbGVyIHdy
b3RlOgo+IEZyb206IERpbmcgVGlhbmhvbmcgPGRpbmd0aWFuaG9uZ0BodWF3ZWkuY29tPgo+IERh
dGU6IE1vbiwgNyBBdWcgMjAxNyAxMjoxMzoxNyArMDgwMAo+IAo+ID4gSGkgRGF2aWTvvJoKPiA+
IAo+ID4gSSB0aGluayBuZXR3b3JraW5nIHRyZWUgbWVyZ2UgaXQgaXMgYSBiZXR0ZXIgY2hvaWNl
LCBhcyBpdCBtYWlubHkgdXNlZCB0byB0ZWxsIHRoZSBOSUMKPiA+IGRyaXZlcnMgaG93IHRvIHVz
ZSB0aGUgUmVsYXhlZCBPcmRlcmluZyBBdHRyaWJ1dGUsIGFuZCBsYXRlciB3ZSBuZWVkIHNlbmQg
cGF0Y2ggdG8gZW5hYmxlCj4gPiBSTyBmb3IgaXhnYmUgZHJpdmVyIGJhc2Ugb24gdGhpcyBwYXRj
aC4gQnV0IEkgYW0gbm90IHN1cmUgd2hldGhlciBCam9ybiBoYXMgc29tZSBvZiBoaXMgb3duCj4g
PiB2aWV3LiA6KQo+ID4gCj4gPiBIaSBCam9ybjoKPiA+IAo+ID4gQ291bGQgeW91IGhlbHAgcmV2
aWV3IHRoaXMgcGF0Y2ggb3IgZ2l2ZSBzb21lIGZlZWRiYWNrID8KPiAKPiBJJ20gc3RpbGwgd2Fp
dGluZyBvbiB0aGlzLi4uCj4gCj4gQmpvcm4/CgpJIHdhcyBvbiB2YWNhdGlvbiBGcmlkYXktdG9k
YXksIGJ1dCBJJ2xsIGxvb2sgYXQgdGhpcyBzZXJpZXMgdGhpcyB3ZWVrLgoKX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGludXgtYXJtLWtlcm5lbCBtYWls
aW5nIGxpc3QKbGludXgtYXJtLWtlcm5lbEBsaXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0
cy5pbmZyYWRlYWQub3JnL21haWxtYW4vbGlzdGluZm8vbGludXgtYXJtLWtlcm5lbAo=

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-08-08  1:56         ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-08  1:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 07, 2017 at 02:14:48PM -0700, David Miller wrote:
> From: Ding Tianhong <dingtianhong@huawei.com>
> Date: Mon, 7 Aug 2017 12:13:17 +0800
> 
> > Hi David?
> > 
> > I think networking tree merge it is a better choice, as it mainly used to tell the NIC
> > drivers how to use the Relaxed Ordering Attribute, and later we need send patch to enable
> > RO for ixgbe driver base on this patch. But I am not sure whether Bjorn has some of his own
> > view. :)
> > 
> > Hi Bjorn:
> > 
> > Could you help review this patch or give some feedback ?
> 
> I'm still waiting on this...
> 
> Bjorn?

I was on vacation Friday-today, but I'll look at this series this week.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-05  7:15   ` Ding Tianhong
  (?)
@ 2017-08-08 23:22     ` Bjorn Helgaas
  -1 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-08 23:22 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: leedom, ashok.raj, bhelgaas, werner, ganeshgr, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

On Sat, Aug 05, 2017 at 03:15:10PM +0800, Ding Tianhong wrote:
> From: Casey Leedom <leedom@chelsio.com>
> 
> The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
> Relaxed Ordering (RO) attribute should not be used for Transaction Layer
> Packets (TLP) targetted towards these affected root complexes. Current list
> of affected parts include some Intel Xeon processors root complex which suffers from
> flow control credits that result in performance issues. On these affected
> parts RO can still be used for peer-2-peer traffic. AMD A1100 ARM ("SEATTLE")
> Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to
> data-corruption.

This needs to include a link to the Intel spec
(https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
sec 3.9.1).

It should also include a pointer to the AMD erratum, if available, or
at least some reference to how we know it doesn't obey the rules.

Ashok, thanks for chiming in.  Now that you have, I have a few more
questions for you:

  - Is the above doc the one you mentioned as being now public?
  
  - Is this considered a hardware erratum?
  
  - If so, is there a pointer to that as well?
  
  - If this is not considered an erratum, can you provide any guidance
    about how an OS should determine when it should use RO?
    
Relying on a list of device IDs in an optimization manual is OK for an
erratum, but if it's *not* an erratum, it seems like a hole in the
specs because as far as I know there's no generic way for the OS to
discover whether to use RO.

Bjorn

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-08 23:22     ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-08 23:22 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom,
	patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira,
	netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit,
	robin.murphy, davem, l.stach

On Sat, Aug 05, 2017 at 03:15:10PM +0800, Ding Tianhong wrote:
> From: Casey Leedom <leedom@chelsio.com>
> 
> The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
> Relaxed Ordering (RO) attribute should not be used for Transaction Layer
> Packets (TLP) targetted towards these affected root complexes. Current list
> of affected parts include some Intel Xeon processors root complex which suffers from
> flow control credits that result in performance issues. On these affected
> parts RO can still be used for peer-2-peer traffic. AMD A1100 ARM ("SEATTLE")
> Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to
> data-corruption.

This needs to include a link to the Intel spec
(https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
sec 3.9.1).

It should also include a pointer to the AMD erratum, if available, or
at least some reference to how we know it doesn't obey the rules.

Ashok, thanks for chiming in.  Now that you have, I have a few more
questions for you:

  - Is the above doc the one you mentioned as being now public?
  
  - Is this considered a hardware erratum?
  
  - If so, is there a pointer to that as well?
  
  - If this is not considered an erratum, can you provide any guidance
    about how an OS should determine when it should use RO?
    
Relying on a list of device IDs in an optimization manual is OK for an
erratum, but if it's *not* an erratum, it seems like a hole in the
specs because as far as I know there's no generic way for the OS to
discover whether to use RO.

Bjorn

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-08 23:22     ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-08 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Aug 05, 2017 at 03:15:10PM +0800, Ding Tianhong wrote:
> From: Casey Leedom <leedom@chelsio.com>
> 
> The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
> Relaxed Ordering (RO) attribute should not be used for Transaction Layer
> Packets (TLP) targetted towards these affected root complexes. Current list
> of affected parts include some Intel Xeon processors root complex which suffers from
> flow control credits that result in performance issues. On these affected
> parts RO can still be used for peer-2-peer traffic. AMD A1100 ARM ("SEATTLE")
> Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to
> data-corruption.

This needs to include a link to the Intel spec
(https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
sec 3.9.1).

It should also include a pointer to the AMD erratum, if available, or
at least some reference to how we know it doesn't obey the rules.

Ashok, thanks for chiming in.  Now that you have, I have a few more
questions for you:

  - Is the above doc the one you mentioned as being now public?
  
  - Is this considered a hardware erratum?
  
  - If so, is there a pointer to that as well?
  
  - If this is not considered an erratum, can you provide any guidance
    about how an OS should determine when it should use RO?
    
Relying on a list of device IDs in an optimization manual is OK for an
erratum, but if it's *not* an erratum, it seems like a hole in the
specs because as far as I know there's no generic way for the OS to
discover whether to use RO.

Bjorn

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-08 23:22     ` Bjorn Helgaas
  (?)
  (?)
@ 2017-08-09  1:40       ` Casey Leedom
  -1 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09  1:40 UTC (permalink / raw)
  To: Bjorn Helgaas, Ding Tianhong
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

| From: Bjorn Helgaas <helgaas@kernel.org>
| Sent: Tuesday, August 8, 2017 4:22 PM
| 
| This needs to include a link to the Intel spec
| (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
| sec 3.9.1).

  In the commit message or as a comment?  Regardless, I agree.  It's always
nice to be able to go back and see what the official documentation says.
However, that said, links on the internet are ... fragile as time goes by,
so we might want to simply quote section 3.9.1 in the commit message since
it's relatively short:

    3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
          and Toward MMIO Regions (P2P)

    In order to maximize performance for PCIe devices in the processors
    listed in Table 3-6 below, the soft- ware should determine whether the
    accesses are toward coherent memory (system memory) or toward MMIO
    regions (P2P access to other devices). If the access is toward MMIO
    region, then software can command HW to set the RO bit in the TLP
    header, as this would allow hardware to achieve maximum throughput for
    these types of accesses. For accesses toward coherent memory, software
    can command HW to clear the RO bit in the TLP header (no RO), as this
    would allow hardware to achieve maximum throughput for these types of
    accesses.

    Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
               PCIe Performance

    Processor                            CPU RP Device IDs

    Intel Xeon processors based on       6F01H-6F0EH
    Broadwell microarchitecture

    Intel Xeon processors based on       2F01H-2F0EH
    Haswell microarchitecture

| It should also include a pointer to the AMD erratum, if available, or
| at least some reference to how we know it doesn't obey the rules.

  Getting an ACK from AMD seems like a forlorn cause at this point.  My
contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
messages almost a year ago saying that all of AMD's energies were being
redirected towards upcoming x86 products (likely Ryzen as we now know).  As
far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
SoC.

  On the specific issue, I can certainly write up somthing even more
extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
review the comment I wrote up and tell me if you'd like something even more
detailed -- I'm usually acused of writing comments which are too long, so
this would be a new one on me ... :-)

| Ashok, thanks for chiming in.  Now that you have, I have a few more
| questions for you:

  I can answer a few of these:

|  - Is the above doc the one you mentioned as being now public?

  Yes.  Ashok worked with me to the extent he was allowed prior to the
publishing of the public technocal note, but he couldn't say much.  (Believe
it or not, it is possible to say less than the quoted section above.)  When
the note was published, Patrick Cramer sent me the note about it and pointed
me at section 3.9.1.

|  - Is this considered a hardware erratum?

  I certainly consider it a Hardware Bug.  And I'm really hoping that Ashok
will be able to find a "Chicken Bit" which allows the broken feature to be
turned off.  Remember, the Relaxed Ordering Attribute on a Transaction Layer
Packet is simply a HINT.  It is perfectly reasonable for a compliant
implementation to simply ignore the Relaxed Ordering Attribute on an
incoming TLP Request.  The sole responsibility of a compliant implementation
is to return the exact same Relaxed Ordering and No Snoop Attributes in any
TLP Response (The rules for ID-Based Ordering Attribute are more complex.)
 
  Earlier Intel Root Complexes did exactly this: they ignored the Relaxed
Ordering Attribute and there was no performance difference for
using/not-using it.  It's pretty obvious that an attempt was made to
implement optimizations surounding the use of Relaxed Ordering and they
didn't work.

|  - If so, is there a pointer to that as well?

  Intel is historically tight-lipped about admiting any bugs/errata in their
products.  I'm guessing that the above quoted Section 3.9.1 is likely to be
all we ever get. The language above regarding TLPs targetting Coherent
Shared Memory are basically as much of an admission that they got it wrong
as we're going to get.  But heck, maybe we'll get lucky ...  Especially with
regard to the hoped for "Chicken Bit" ...

|  - If this is not considered an erratum, can you provide any guidance
|    about how an OS should determine when it should use RO?

  Software?  We don't need no stinking software!

  Sorry, I couldn't resist.

| Relying on a list of device IDs in an optimization manual is OK for an
| erratum, but if it's *not* an erratum, it seems like a hole in the specs
| because as far as I know there's no generic way for the OS to discover
| whether to use RO.

  Well, here's to hoping that Ashok and/or Patrick are able to offer more
detailed information ...

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09  1:40       ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09  1:40 UTC (permalink / raw)
  To: Bjorn Helgaas, Ding Tianhong
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland@arm.com

| From: Bjorn Helgaas <helgaas@kernel.org>
| Sent: Tuesday, August 8, 2017 4:22 PM
| 
| This needs to include a link to the Intel spec
| (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
| sec 3.9.1).

  In the commit message or as a comment?  Regardless, I agree.  It's always
nice to be able to go back and see what the official documentation says.
However, that said, links on the internet are ... fragile as time goes by,
so we might want to simply quote section 3.9.1 in the commit message since
it's relatively short:

    3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
          and Toward MMIO Regions (P2P)

    In order to maximize performance for PCIe devices in the processors
    listed in Table 3-6 below, the soft- ware should determine whether the
    accesses are toward coherent memory (system memory) or toward MMIO
    regions (P2P access to other devices). If the access is toward MMIO
    region, then software can command HW to set the RO bit in the TLP
    header, as this would allow hardware to achieve maximum throughput for
    these types of accesses. For accesses toward coherent memory, software
    can command HW to clear the RO bit in the TLP header (no RO), as this
    would allow hardware to achieve maximum throughput for these types of
    accesses.

    Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
               PCIe Performance

    Processor                            CPU RP Device IDs

    Intel Xeon processors based on       6F01H-6F0EH
    Broadwell microarchitecture

    Intel Xeon processors based on       2F01H-2F0EH
    Haswell microarchitecture

| It should also include a pointer to the AMD erratum, if available, or
| at least some reference to how we know it doesn't obey the rules.

  Getting an ACK from AMD seems like a forlorn cause at this point.  My
contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
messages almost a year ago saying that all of AMD's energies were being
redirected towards upcoming x86 products (likely Ryzen as we now know).  As
far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
SoC.

  On the specific issue, I can certainly write up somthing even more
extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
review the comment I wrote up and tell me if you'd like something even more
detailed -- I'm usually acused of writing comments which are too long, so
this would be a new one on me ... :-)

| Ashok, thanks for chiming in.  Now that you have, I have a few more
| questions for you:

  I can answer a few of these:

|  - Is the above doc the one you mentioned as being now public?

  Yes.  Ashok worked with me to the extent he was allowed prior to the
publishing of the public technocal note, but he couldn't say much.  (Believe
it or not, it is possible to say less than the quoted section above.)  When
the note was published, Patrick Cramer sent me the note about it and pointed
me at section 3.9.1.

|  - Is this considered a hardware erratum?

  I certainly consider it a Hardware Bug.  And I'm really hoping that Ashok
will be able to find a "Chicken Bit" which allows the broken feature to be
turned off.  Remember, the Relaxed Ordering Attribute on a Transaction Layer
Packet is simply a HINT.  It is perfectly reasonable for a compliant
implementation to simply ignore the Relaxed Ordering Attribute on an
incoming TLP Request.  The sole responsibility of a compliant implementation
is to return the exact same Relaxed Ordering and No Snoop Attributes in any
TLP Response (The rules for ID-Based Ordering Attribute are more complex.)
 
  Earlier Intel Root Complexes did exactly this: they ignored the Relaxed
Ordering Attribute and there was no performance difference for
using/not-using it.  It's pretty obvious that an attempt was made to
implement optimizations surounding the use of Relaxed Ordering and they
didn't work.

|  - If so, is there a pointer to that as well?

  Intel is historically tight-lipped about admiting any bugs/errata in their
products.  I'm guessing that the above quoted Section 3.9.1 is likely to be
all we ever get. The language above regarding TLPs targetting Coherent
Shared Memory are basically as much of an admission that they got it wrong
as we're going to get.  But heck, maybe we'll get lucky ...  Especially with
regard to the hoped for "Chicken Bit" ...

|  - If this is not considered an erratum, can you provide any guidance
|    about how an OS should determine when it should use RO?

  Software?  We don't need no stinking software!

  Sorry, I couldn't resist.

| Relying on a list of device IDs in an optimization manual is OK for an
| erratum, but if it's *not* an erratum, it seems like a hole in the specs
| because as far as I know there's no generic way for the OS to discover
| whether to use RO.

  Well, here's to hoping that Ashok and/or Patrick are able to offer more
detailed information ...

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09  1:40       ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09  1:40 UTC (permalink / raw)
  To: Bjorn Helgaas, Ding Tianhong
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

| From: Bjorn Helgaas <helgaas@kernel.org>
| Sent: Tuesday, August 8, 2017 4:22 PM
|=20
| This needs to include a link to the Intel spec
| (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-ar=
chitectures-optimization-manual.pdf,
| sec 3.9.1).

  In the commit message or as a comment?  Regardless, I agree.  It's always
nice to be able to go back and see what the official documentation says.
However, that said, links on the internet are ... fragile as time goes by,
so we might want to simply quote section 3.9.1 in the commit message since
it's relatively short:

    3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
          and Toward MMIO Regions (P2P)

    In order to maximize performance for PCIe devices in the processors
    listed in Table 3-6 below, the soft- ware should determine whether the
    accesses are toward coherent memory (system memory) or toward MMIO
    regions (P2P access to other devices). If the access is toward MMIO
    region, then software can command HW to set the RO bit in the TLP
    header, as this would allow hardware to achieve maximum throughput for
    these types of accesses. For accesses toward coherent memory, software
    can command HW to clear the RO bit in the TLP header (no RO), as this
    would allow hardware to achieve maximum throughput for these types of
    accesses.

    Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
               PCIe Performance

    Processor                            CPU RP Device IDs

    Intel Xeon processors based on       6F01H-6F0EH
    Broadwell microarchitecture

    Intel Xeon processors based on       2F01H-2F0EH
    Haswell microarchitecture

| It should also include a pointer to the AMD erratum, if available, or
| at least some reference to how we know it doesn't obey the rules.

  Getting an ACK from AMD seems like a forlorn cause at this point.  My
contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
messages almost a year ago saying that all of AMD's energies were being
redirected towards upcoming x86 products (likely Ryzen as we now know).  As
far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
SoC.

  On the specific issue, I can certainly write up somthing even more
extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
review the comment I wrote up and tell me if you'd like something even more
detailed -- I'm usually acused of writing comments which are too long, so
this would be a new one on me ... :-)

| Ashok, thanks for chiming in.  Now that you have, I have a few more
| questions for you:

  I can answer a few of these:

|  - Is the above doc the one you mentioned as being now public?

  Yes.  Ashok worked with me to the extent he was allowed prior to the
publishing of the public technocal note, but he couldn't say much.  (Believ=
e
it or not, it is possible to say less than the quoted section above.)  When
the note was published, Patrick Cramer sent me the note about it and pointe=
d
me at section 3.9.1.

|  - Is this considered a hardware erratum?

  I certainly consider it a Hardware Bug.  And I'm really hoping that Ashok
will be able to find a "Chicken Bit" which allows the broken feature to be
turned off.  Remember, the Relaxed Ordering Attribute on a Transaction Laye=
r
Packet is simply a HINT.  It is perfectly reasonable for a compliant
implementation to simply ignore the Relaxed Ordering Attribute on an
incoming TLP Request.  The sole responsibility of a compliant implementatio=
n
is to return the exact same Relaxed Ordering and No Snoop Attributes in any
TLP Response (The rules for ID-Based Ordering Attribute are more complex.)
=20
  Earlier Intel Root Complexes did exactly this: they ignored the Relaxed
Ordering Attribute and there was no performance difference for
using/not-using it.  It's pretty obvious that an attempt was made to
implement optimizations surounding the use of Relaxed Ordering and they
didn't work.

|  - If so, is there a pointer to that as well?

  Intel is historically tight-lipped about admiting any bugs/errata in thei=
r
products.  I'm guessing that the above quoted Section 3.9.1 is likely to be
all we ever get. The language above regarding TLPs targetting Coherent
Shared Memory are basically as much of an admission that they got it wrong
as we're going to get.  But heck, maybe we'll get lucky ...  Especially wit=
h
regard to the hoped for "Chicken Bit" ...

|  - If this is not considered an erratum, can you provide any guidance
|    about how an OS should determine when it should use RO?

  Software?  We don't need no stinking software!

  Sorry, I couldn't resist.

| Relying on a list of device IDs in an optimization manual is OK for an
| erratum, but if it's *not* an erratum, it seems like a hole in the specs
| because as far as I know there's no generic way for the OS to discover
| whether to use RO.

  Well, here's to hoping that Ashok and/or Patrick are able to offer more
detailed information ...

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09  1:40       ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09  1:40 UTC (permalink / raw)
  To: linux-arm-kernel

| From: Bjorn Helgaas <helgaas@kernel.org>
| Sent: Tuesday, August 8, 2017 4:22 PM
| 
| This needs to include a link to the Intel spec
| (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
| sec 3.9.1).

  In the commit message or as a comment?  Regardless, I agree.  It's always
nice to be able to go back and see what the official documentation says.
However, that said, links on the internet are ... fragile as time goes by,
so we might want to simply quote section 3.9.1 in the commit message since
it's relatively short:

    3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
          and Toward MMIO Regions (P2P)

    In order to maximize performance for PCIe devices in the processors
    listed in Table 3-6 below, the soft- ware should determine whether the
    accesses are toward coherent memory (system memory) or toward MMIO
    regions (P2P access to other devices). If the access is toward MMIO
    region, then software can command HW to set the RO bit in the TLP
    header, as this would allow hardware to achieve maximum throughput for
    these types of accesses. For accesses toward coherent memory, software
    can command HW to clear the RO bit in the TLP header (no RO), as this
    would allow hardware to achieve maximum throughput for these types of
    accesses.

    Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
               PCIe Performance

    Processor                            CPU RP Device IDs

    Intel Xeon processors based on       6F01H-6F0EH
    Broadwell microarchitecture

    Intel Xeon processors based on       2F01H-2F0EH
    Haswell microarchitecture

| It should also include a pointer to the AMD erratum, if available, or
| at least some reference to how we know it doesn't obey the rules.

  Getting an ACK from AMD seems like a forlorn cause at this point.  My
contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
messages almost a year ago saying that all of AMD's energies were being
redirected towards upcoming x86 products (likely Ryzen as we now know).  As
far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
SoC.

  On the specific issue, I can certainly write up somthing even more
extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
review the comment I wrote up and tell me if you'd like something even more
detailed -- I'm usually acused of writing comments which are too long, so
this would be a new one on me ... :-)

| Ashok, thanks for chiming in.  Now that you have, I have a few more
| questions for you:

  I can answer a few of these:

|  - Is the above doc the one you mentioned as being now public?

  Yes.  Ashok worked with me to the extent he was allowed prior to the
publishing of the public technocal note, but he couldn't say much.  (Believe
it or not, it is possible to say less than the quoted section above.)  When
the note was published, Patrick Cramer sent me the note about it and pointed
me at section 3.9.1.

|  - Is this considered a hardware erratum?

  I certainly consider it a Hardware Bug.  And I'm really hoping that Ashok
will be able to find a "Chicken Bit" which allows the broken feature to be
turned off.  Remember, the Relaxed Ordering Attribute on a Transaction Layer
Packet is simply a HINT.  It is perfectly reasonable for a compliant
implementation to simply ignore the Relaxed Ordering Attribute on an
incoming TLP Request.  The sole responsibility of a compliant implementation
is to return the exact same Relaxed Ordering and No Snoop Attributes in any
TLP Response (The rules for ID-Based Ordering Attribute are more complex.)
 
  Earlier Intel Root Complexes did exactly this: they ignored the Relaxed
Ordering Attribute and there was no performance difference for
using/not-using it.  It's pretty obvious that an attempt was made to
implement optimizations surounding the use of Relaxed Ordering and they
didn't work.

|  - If so, is there a pointer to that as well?

  Intel is historically tight-lipped about admiting any bugs/errata in their
products.  I'm guessing that the above quoted Section 3.9.1 is likely to be
all we ever get. The language above regarding TLPs targetting Coherent
Shared Memory are basically as much of an admission that they got it wrong
as we're going to get.  But heck, maybe we'll get lucky ...  Especially with
regard to the hoped for "Chicken Bit" ...

|  - If this is not considered an erratum, can you provide any guidance
|    about how an OS should determine when it should use RO?

  Software?  We don't need no stinking software!

  Sorry, I couldn't resist.

| Relying on a list of device IDs in an optimization manual is OK for an
| erratum, but if it's *not* an erratum, it seems like a hole in the specs
| because as far as I know there's no generic way for the OS to discover
| whether to use RO.

  Well, here's to hoping that Ashok and/or Patrick are able to offer more
detailed information ...

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
  2017-08-05  7:15   ` Ding Tianhong
  (?)
@ 2017-08-09  2:22     ` Bjorn Helgaas
  -1 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  2:22 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: leedom, ashok.raj, bhelgaas, werner, ganeshgr, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> When bit4 is set in the PCIe Device Control register, it indicates
> whether the device is permitted to use relaxed ordering.
> On some platforms using relaxed ordering can have performance issues or
> due to erratum can cause data-corruption. In such cases devices must avoid
> using relaxed ordering.
> 
> This patch checks if there is any node in the hierarchy that indicates that
> using relaxed ordering is not safe. 

I think you only check the devices between the root port and the
target device.  For example, you don't check siblings or cousins of
the target device.

> In such cases the patch turns off the
> relaxed ordering by clearing the eapability for this device.

s/eapability/capability/

> And if the
> device is probably running in a guest machine, we should do nothing.

I don't know what this sentence means.  "Probably running in a guest
machine" doesn't really make sense, and there's nothing in your patch
that explicitly checks for being in a guest machine.

> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Acked-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
>  drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  2 ++
>  3 files changed, 68 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index af0cc34..4f9d7c1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>  EXPORT_SYMBOL(pcie_set_mps);
>  
>  /**
> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible clear relaxed ordering

Why "If possible"?  The bit is required to be RW or hardwired to zero,
so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.

> + */
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
> +{
> +	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
> +					  PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);

The current series doesn't add any callers of this except
pci_configure_relaxed_ordering(), so it doesn't need to be exported to
modules.

I think I would put both of these functions in drivers/pci/probe.c.
Then this one could be static and you'd only have to add
pcie_relaxed_ordering_supported() to include/linux/pci.h.

> +
> +/**
> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support

s/relexed/relaxed/

> + * @dev: PCI device to query
> + *
> + * Returns true if the device support relaxed ordering attribute.
> + */
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
> +{
> +	u16 v;
> +
> +	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
> +
> +	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);

This is misnamed.  This doesn't tell us anything about whether the
device *supports* relaxed ordering.  It only tells us whether the
device is *permitted* to use it.

When a device initiates a transaction, the hardware should set the RO
bit in the TLP with logic something like this:

  RO = <this Function supports relaxed ordering> &&
       <this transaction doesn't require strong write ordering> &&
       <PCI_EXP_DEVCTL_RELAX_EN is set>

The issue you're fixing is that some Completers don't handle RO
correctly.  The determining factor is not the Requester, but the
Completer (for this series, a Root Port).  So I think this should be
something like:

  int pcie_relaxed_ordering_broken(struct pci_dev *completer)
  {
    if (!completer)
      return 0;

    return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
  }

and the caller should do something like this:

 if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
   adapter->flags |= ROOT_NO_RELAXED_ORDERING;

That way it's obvious where the issue is, and it's obvious that the
answer might be different for peer-to-peer transactions than it is for
transactions to the root port, i.e., to coherent memory.

> +
> +/**
>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>   * @dev: PCI device to query
>   * @speed: storage for minimum speed
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index c31310d..48df012 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
>  					 PCI_EXP_DEVCTL_EXT_TAG);
>  }
>  
> +/**
> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device
> + * should disable the relaxed ordering attribute.
> + * @dev: PCI device
> + *
> + * Return true if any of the PCI devices above us do not support
> + * relaxed ordering.
> + */
> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
> +{
> +	while (dev) {
> +		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
> +			return true;
> +
> +		dev = dev->bus->self;
> +	}
> +
> +	return false;
> +}
> +
> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
> +{
> +	/* We should not alter the relaxed ordering bit for the VF */
> +	if (dev->is_virtfn)
> +		return;
> +
> +	/* If the releaxed ordering enable bit is not set, do nothing. */

s/releaxed/relaxed/

> +	if (!pcie_relaxed_ordering_supported(dev))
> +		return;
> +
> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {
> +		pcie_clear_relaxed_ordering(dev);
> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");

This associates the message with the Requester that may potentially
use relaxed ordering.  But there's nothing wrong or unusual about the
Requester; the issue is with the *Completer*, so I think the message
should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
Maybe it should be both places; I dunno.

This implementation assumes the device only initiates transactions to
coherent memory, i.e., it assumes the device never does peer-to-peer
DMA.  I guess we'll have to wait and see if we trip over any
peer-to-peer issues, then figure out how to handle them.

> +	}
> +}
> +
>  static void pci_configure_device(struct pci_dev *dev)
>  {
>  	struct hotplug_params hpp;
> @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
>  
>  	pci_configure_mps(dev);
>  	pci_configure_extended_tags(dev);
> +	pci_configure_relaxed_ordering(dev);
>  
>  	memset(&hpp, 0, sizeof(hpp));
>  	ret = pci_get_hp_params(dev, &hpp);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 412ec1c..3aa23a2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
>  void pci_pme_wakeup_bus(struct pci_bus *bus);
>  void pci_d3cold_enable(struct pci_dev *dev);
>  void pci_d3cold_disable(struct pci_dev *dev);
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
>  
>  /* PCI Virtual Channel */
>  int pci_save_vc_state(struct pci_dev *dev);
> -- 
> 1.8.3.1
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-09  2:22     ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  2:22 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom,
	patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira,
	netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit,
	robin.murphy, davem, l.stach

On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> When bit4 is set in the PCIe Device Control register, it indicates
> whether the device is permitted to use relaxed ordering.
> On some platforms using relaxed ordering can have performance issues or
> due to erratum can cause data-corruption. In such cases devices must avoid
> using relaxed ordering.
> 
> This patch checks if there is any node in the hierarchy that indicates that
> using relaxed ordering is not safe. 

I think you only check the devices between the root port and the
target device.  For example, you don't check siblings or cousins of
the target device.

> In such cases the patch turns off the
> relaxed ordering by clearing the eapability for this device.

s/eapability/capability/

> And if the
> device is probably running in a guest machine, we should do nothing.

I don't know what this sentence means.  "Probably running in a guest
machine" doesn't really make sense, and there's nothing in your patch
that explicitly checks for being in a guest machine.

> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Acked-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
>  drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  2 ++
>  3 files changed, 68 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index af0cc34..4f9d7c1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>  EXPORT_SYMBOL(pcie_set_mps);
>  
>  /**
> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible clear relaxed ordering

Why "If possible"?  The bit is required to be RW or hardwired to zero,
so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.

> + */
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
> +{
> +	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
> +					  PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);

The current series doesn't add any callers of this except
pci_configure_relaxed_ordering(), so it doesn't need to be exported to
modules.

I think I would put both of these functions in drivers/pci/probe.c.
Then this one could be static and you'd only have to add
pcie_relaxed_ordering_supported() to include/linux/pci.h.

> +
> +/**
> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support

s/relexed/relaxed/

> + * @dev: PCI device to query
> + *
> + * Returns true if the device support relaxed ordering attribute.
> + */
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
> +{
> +	u16 v;
> +
> +	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
> +
> +	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);

This is misnamed.  This doesn't tell us anything about whether the
device *supports* relaxed ordering.  It only tells us whether the
device is *permitted* to use it.

When a device initiates a transaction, the hardware should set the RO
bit in the TLP with logic something like this:

  RO = <this Function supports relaxed ordering> &&
       <this transaction doesn't require strong write ordering> &&
       <PCI_EXP_DEVCTL_RELAX_EN is set>

The issue you're fixing is that some Completers don't handle RO
correctly.  The determining factor is not the Requester, but the
Completer (for this series, a Root Port).  So I think this should be
something like:

  int pcie_relaxed_ordering_broken(struct pci_dev *completer)
  {
    if (!completer)
      return 0;

    return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
  }

and the caller should do something like this:

 if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
   adapter->flags |= ROOT_NO_RELAXED_ORDERING;

That way it's obvious where the issue is, and it's obvious that the
answer might be different for peer-to-peer transactions than it is for
transactions to the root port, i.e., to coherent memory.

> +
> +/**
>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>   * @dev: PCI device to query
>   * @speed: storage for minimum speed
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index c31310d..48df012 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
>  					 PCI_EXP_DEVCTL_EXT_TAG);
>  }
>  
> +/**
> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device
> + * should disable the relaxed ordering attribute.
> + * @dev: PCI device
> + *
> + * Return true if any of the PCI devices above us do not support
> + * relaxed ordering.
> + */
> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
> +{
> +	while (dev) {
> +		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
> +			return true;
> +
> +		dev = dev->bus->self;
> +	}
> +
> +	return false;
> +}
> +
> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
> +{
> +	/* We should not alter the relaxed ordering bit for the VF */
> +	if (dev->is_virtfn)
> +		return;
> +
> +	/* If the releaxed ordering enable bit is not set, do nothing. */

s/releaxed/relaxed/

> +	if (!pcie_relaxed_ordering_supported(dev))
> +		return;
> +
> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {
> +		pcie_clear_relaxed_ordering(dev);
> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");

This associates the message with the Requester that may potentially
use relaxed ordering.  But there's nothing wrong or unusual about the
Requester; the issue is with the *Completer*, so I think the message
should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
Maybe it should be both places; I dunno.

This implementation assumes the device only initiates transactions to
coherent memory, i.e., it assumes the device never does peer-to-peer
DMA.  I guess we'll have to wait and see if we trip over any
peer-to-peer issues, then figure out how to handle them.

> +	}
> +}
> +
>  static void pci_configure_device(struct pci_dev *dev)
>  {
>  	struct hotplug_params hpp;
> @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
>  
>  	pci_configure_mps(dev);
>  	pci_configure_extended_tags(dev);
> +	pci_configure_relaxed_ordering(dev);
>  
>  	memset(&hpp, 0, sizeof(hpp));
>  	ret = pci_get_hp_params(dev, &hpp);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 412ec1c..3aa23a2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
>  void pci_pme_wakeup_bus(struct pci_bus *bus);
>  void pci_d3cold_enable(struct pci_dev *dev);
>  void pci_d3cold_disable(struct pci_dev *dev);
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
>  
>  /* PCI Virtual Channel */
>  int pci_save_vc_state(struct pci_dev *dev);
> -- 
> 1.8.3.1
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-09  2:22     ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  2:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> When bit4 is set in the PCIe Device Control register, it indicates
> whether the device is permitted to use relaxed ordering.
> On some platforms using relaxed ordering can have performance issues or
> due to erratum can cause data-corruption. In such cases devices must avoid
> using relaxed ordering.
> 
> This patch checks if there is any node in the hierarchy that indicates that
> using relaxed ordering is not safe. 

I think you only check the devices between the root port and the
target device.  For example, you don't check siblings or cousins of
the target device.

> In such cases the patch turns off the
> relaxed ordering by clearing the eapability for this device.

s/eapability/capability/

> And if the
> device is probably running in a guest machine, we should do nothing.

I don't know what this sentence means.  "Probably running in a guest
machine" doesn't really make sense, and there's nothing in your patch
that explicitly checks for being in a guest machine.

> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Acked-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
>  drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  2 ++
>  3 files changed, 68 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index af0cc34..4f9d7c1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>  EXPORT_SYMBOL(pcie_set_mps);
>  
>  /**
> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible clear relaxed ordering

Why "If possible"?  The bit is required to be RW or hardwired to zero,
so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.

> + */
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
> +{
> +	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
> +					  PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);

The current series doesn't add any callers of this except
pci_configure_relaxed_ordering(), so it doesn't need to be exported to
modules.

I think I would put both of these functions in drivers/pci/probe.c.
Then this one could be static and you'd only have to add
pcie_relaxed_ordering_supported() to include/linux/pci.h.

> +
> +/**
> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support

s/relexed/relaxed/

> + * @dev: PCI device to query
> + *
> + * Returns true if the device support relaxed ordering attribute.
> + */
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
> +{
> +	u16 v;
> +
> +	pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
> +
> +	return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);

This is misnamed.  This doesn't tell us anything about whether the
device *supports* relaxed ordering.  It only tells us whether the
device is *permitted* to use it.

When a device initiates a transaction, the hardware should set the RO
bit in the TLP with logic something like this:

  RO = <this Function supports relaxed ordering> &&
       <this transaction doesn't require strong write ordering> &&
       <PCI_EXP_DEVCTL_RELAX_EN is set>

The issue you're fixing is that some Completers don't handle RO
correctly.  The determining factor is not the Requester, but the
Completer (for this series, a Root Port).  So I think this should be
something like:

  int pcie_relaxed_ordering_broken(struct pci_dev *completer)
  {
    if (!completer)
      return 0;

    return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
  }

and the caller should do something like this:

 if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
   adapter->flags |= ROOT_NO_RELAXED_ORDERING;

That way it's obvious where the issue is, and it's obvious that the
answer might be different for peer-to-peer transactions than it is for
transactions to the root port, i.e., to coherent memory.

> +
> +/**
>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>   * @dev: PCI device to query
>   * @speed: storage for minimum speed
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index c31310d..48df012 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
>  					 PCI_EXP_DEVCTL_EXT_TAG);
>  }
>  
> +/**
> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device
> + * should disable the relaxed ordering attribute.
> + * @dev: PCI device
> + *
> + * Return true if any of the PCI devices above us do not support
> + * relaxed ordering.
> + */
> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
> +{
> +	while (dev) {
> +		if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
> +			return true;
> +
> +		dev = dev->bus->self;
> +	}
> +
> +	return false;
> +}
> +
> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
> +{
> +	/* We should not alter the relaxed ordering bit for the VF */
> +	if (dev->is_virtfn)
> +		return;
> +
> +	/* If the releaxed ordering enable bit is not set, do nothing. */

s/releaxed/relaxed/

> +	if (!pcie_relaxed_ordering_supported(dev))
> +		return;
> +
> +	if (pci_dev_should_disable_relaxed_ordering(dev)) {
> +		pcie_clear_relaxed_ordering(dev);
> +		dev_info(&dev->dev, "Disable Relaxed Ordering\n");

This associates the message with the Requester that may potentially
use relaxed ordering.  But there's nothing wrong or unusual about the
Requester; the issue is with the *Completer*, so I think the message
should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
Maybe it should be both places; I dunno.

This implementation assumes the device only initiates transactions to
coherent memory, i.e., it assumes the device never does peer-to-peer
DMA.  I guess we'll have to wait and see if we trip over any
peer-to-peer issues, then figure out how to handle them.

> +	}
> +}
> +
>  static void pci_configure_device(struct pci_dev *dev)
>  {
>  	struct hotplug_params hpp;
> @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
>  
>  	pci_configure_mps(dev);
>  	pci_configure_extended_tags(dev);
> +	pci_configure_relaxed_ordering(dev);
>  
>  	memset(&hpp, 0, sizeof(hpp));
>  	ret = pci_get_hp_params(dev, &hpp);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 412ec1c..3aa23a2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
>  void pci_pme_wakeup_bus(struct pci_bus *bus);
>  void pci_d3cold_enable(struct pci_dev *dev);
>  void pci_d3cold_disable(struct pci_dev *dev);
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
>  
>  /* PCI Virtual Channel */
>  int pci_save_vc_state(struct pci_dev *dev);
> -- 
> 1.8.3.1
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-09  1:40       ` Casey Leedom
  (?)
  (?)
@ 2017-08-09  3:02         ` Bjorn Helgaas
  -1 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  3:02 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Ding Tianhong, ashok.raj, bhelgaas, Michael Werner, Ganesh GR,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm

On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
> | From: Bjorn Helgaas <helgaas@kernel.org>
> | Sent: Tuesday, August 8, 2017 4:22 PM
> | 
> | This needs to include a link to the Intel spec
> | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
> | sec 3.9.1).
> 
>   In the commit message or as a comment?  Regardless, I agree.  It's always
> nice to be able to go back and see what the official documentation says.
> However, that said, links on the internet are ... fragile as time goes by,
> so we might want to simply quote section 3.9.1 in the commit message since
> it's relatively short:
> 
>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>           and Toward MMIO Regions (P2P)
> 
>     In order to maximize performance for PCIe devices in the processors
>     listed in Table 3-6 below, the soft- ware should determine whether the
>     accesses are toward coherent memory (system memory) or toward MMIO
>     regions (P2P access to other devices). If the access is toward MMIO
>     region, then software can command HW to set the RO bit in the TLP
>     header, as this would allow hardware to achieve maximum throughput for
>     these types of accesses. For accesses toward coherent memory, software
>     can command HW to clear the RO bit in the TLP header (no RO), as this
>     would allow hardware to achieve maximum throughput for these types of
>     accesses.
> 
>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>                PCIe Performance
> 
>     Processor                            CPU RP Device IDs
> 
>     Intel Xeon processors based on       6F01H-6F0EH
>     Broadwell microarchitecture
> 
>     Intel Xeon processors based on       2F01H-2F0EH
>     Haswell microarchitecture

Agreed, links are prone to being broken.  I would include in the
changelog the complete title and order number, along with the link as
a footnote.  Wouldn't hurt to quote the section too, since it's short.

> | It should also include a pointer to the AMD erratum, if available, or
> | at least some reference to how we know it doesn't obey the rules.
> 
>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
> messages almost a year ago saying that all of AMD's energies were being
> redirected towards upcoming x86 products (likely Ryzen as we now know).  As
> far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
> SoC.
> 
>   On the specific issue, I can certainly write up somthing even more
> extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
> review the comment I wrote up and tell me if you'd like something even more
> detailed -- I'm usually acused of writing comments which are too long, so
> this would be a new one on me ... :-)

If you have any bug reports with info about how you debugged it and
concluded that Seattle is broken, you could include a link (probably
in the changelog).  But if there isn't anything, there isn't anything.

I might reorganize those patches as:

  1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that
  sets it, and the current patch [2/4] that uses it.

  2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel
  details.

  3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD
  details.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09  3:02         ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  3:02 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Ding Tianhong, ashok.raj, bhelgaas, Michael Werner, Ganesh GR,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon@arm.com

On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
> | From: Bjorn Helgaas <helgaas@kernel.org>
> | Sent: Tuesday, August 8, 2017 4:22 PM
> | 
> | This needs to include a link to the Intel spec
> | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
> | sec 3.9.1).
> 
>   In the commit message or as a comment?  Regardless, I agree.  It's always
> nice to be able to go back and see what the official documentation says.
> However, that said, links on the internet are ... fragile as time goes by,
> so we might want to simply quote section 3.9.1 in the commit message since
> it's relatively short:
> 
>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>           and Toward MMIO Regions (P2P)
> 
>     In order to maximize performance for PCIe devices in the processors
>     listed in Table 3-6 below, the soft- ware should determine whether the
>     accesses are toward coherent memory (system memory) or toward MMIO
>     regions (P2P access to other devices). If the access is toward MMIO
>     region, then software can command HW to set the RO bit in the TLP
>     header, as this would allow hardware to achieve maximum throughput for
>     these types of accesses. For accesses toward coherent memory, software
>     can command HW to clear the RO bit in the TLP header (no RO), as this
>     would allow hardware to achieve maximum throughput for these types of
>     accesses.
> 
>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>                PCIe Performance
> 
>     Processor                            CPU RP Device IDs
> 
>     Intel Xeon processors based on       6F01H-6F0EH
>     Broadwell microarchitecture
> 
>     Intel Xeon processors based on       2F01H-2F0EH
>     Haswell microarchitecture

Agreed, links are prone to being broken.  I would include in the
changelog the complete title and order number, along with the link as
a footnote.  Wouldn't hurt to quote the section too, since it's short.

> | It should also include a pointer to the AMD erratum, if available, or
> | at least some reference to how we know it doesn't obey the rules.
> 
>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
> messages almost a year ago saying that all of AMD's energies were being
> redirected towards upcoming x86 products (likely Ryzen as we now know).  As
> far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
> SoC.
> 
>   On the specific issue, I can certainly write up somthing even more
> extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
> review the comment I wrote up and tell me if you'd like something even more
> detailed -- I'm usually acused of writing comments which are too long, so
> this would be a new one on me ... :-)

If you have any bug reports with info about how you debugged it and
concluded that Seattle is broken, you could include a link (probably
in the changelog).  But if there isn't anything, there isn't anything.

I might reorganize those patches as:

  1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that
  sets it, and the current patch [2/4] that uses it.

  2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel
  details.

  3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD
  details.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09  3:02         ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  3:02 UTC (permalink / raw)
  To: Casey Leedom
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, Ding Tianhong, Ganesh GR, Bob.Shaw,
	patrick.j.cramer, bhelgaas, Michael Werner, linux-arm-kernel,
	amira, linux-pci, netdev, linux-kernel, David.Laight,
	Suravee.Suthikulpanit, robin.murphy, davem, l.stach

On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
> | From: Bjorn Helgaas <helgaas@kernel.org>
> | Sent: Tuesday, August 8, 2017 4:22 PM
> | 
> | This needs to include a link to the Intel spec
> | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
> | sec 3.9.1).
> 
>   In the commit message or as a comment?  Regardless, I agree.  It's always
> nice to be able to go back and see what the official documentation says.
> However, that said, links on the internet are ... fragile as time goes by,
> so we might want to simply quote section 3.9.1 in the commit message since
> it's relatively short:
> 
>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>           and Toward MMIO Regions (P2P)
> 
>     In order to maximize performance for PCIe devices in the processors
>     listed in Table 3-6 below, the soft- ware should determine whether the
>     accesses are toward coherent memory (system memory) or toward MMIO
>     regions (P2P access to other devices). If the access is toward MMIO
>     region, then software can command HW to set the RO bit in the TLP
>     header, as this would allow hardware to achieve maximum throughput for
>     these types of accesses. For accesses toward coherent memory, software
>     can command HW to clear the RO bit in the TLP header (no RO), as this
>     would allow hardware to achieve maximum throughput for these types of
>     accesses.
> 
>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>                PCIe Performance
> 
>     Processor                            CPU RP Device IDs
> 
>     Intel Xeon processors based on       6F01H-6F0EH
>     Broadwell microarchitecture
> 
>     Intel Xeon processors based on       2F01H-2F0EH
>     Haswell microarchitecture

Agreed, links are prone to being broken.  I would include in the
changelog the complete title and order number, along with the link as
a footnote.  Wouldn't hurt to quote the section too, since it's short.

> | It should also include a pointer to the AMD erratum, if available, or
> | at least some reference to how we know it doesn't obey the rules.
> 
>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
> messages almost a year ago saying that all of AMD's energies were being
> redirected towards upcoming x86 products (likely Ryzen as we now know).  As
> far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
> SoC.
> 
>   On the specific issue, I can certainly write up somthing even more
> extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
> review the comment I wrote up and tell me if you'd like something even more
> detailed -- I'm usually acused of writing comments which are too long, so
> this would be a new one on me ... :-)

If you have any bug reports with info about how you debugged it and
concluded that Seattle is broken, you could include a link (probably
in the changelog).  But if there isn't anything, there isn't anything.

I might reorganize those patches as:

  1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that
  sets it, and the current patch [2/4] that uses it.

  2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel
  details.

  3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD
  details.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09  3:02         ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  3:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
> | From: Bjorn Helgaas <helgaas@kernel.org>
> | Sent: Tuesday, August 8, 2017 4:22 PM
> | 
> | This needs to include a link to the Intel spec
> | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
> | sec 3.9.1).
> 
>   In the commit message or as a comment?  Regardless, I agree.  It's always
> nice to be able to go back and see what the official documentation says.
> However, that said, links on the internet are ... fragile as time goes by,
> so we might want to simply quote section 3.9.1 in the commit message since
> it's relatively short:
> 
>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>           and Toward MMIO Regions (P2P)
> 
>     In order to maximize performance for PCIe devices in the processors
>     listed in Table 3-6 below, the soft- ware should determine whether the
>     accesses are toward coherent memory (system memory) or toward MMIO
>     regions (P2P access to other devices). If the access is toward MMIO
>     region, then software can command HW to set the RO bit in the TLP
>     header, as this would allow hardware to achieve maximum throughput for
>     these types of accesses. For accesses toward coherent memory, software
>     can command HW to clear the RO bit in the TLP header (no RO), as this
>     would allow hardware to achieve maximum throughput for these types of
>     accesses.
> 
>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>                PCIe Performance
> 
>     Processor                            CPU RP Device IDs
> 
>     Intel Xeon processors based on       6F01H-6F0EH
>     Broadwell microarchitecture
> 
>     Intel Xeon processors based on       2F01H-2F0EH
>     Haswell microarchitecture

Agreed, links are prone to being broken.  I would include in the
changelog the complete title and order number, along with the link as
a footnote.  Wouldn't hurt to quote the section too, since it's short.

> | It should also include a pointer to the AMD erratum, if available, or
> | at least some reference to how we know it doesn't obey the rules.
> 
>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
> messages almost a year ago saying that all of AMD's energies were being
> redirected towards upcoming x86 products (likely Ryzen as we now know).  As
> far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
> SoC.
> 
>   On the specific issue, I can certainly write up somthing even more
> extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
> review the comment I wrote up and tell me if you'd like something even more
> detailed -- I'm usually acused of writing comments which are too long, so
> this would be a new one on me ... :-)

If you have any bug reports with info about how you debugged it and
concluded that Seattle is broken, you could include a link (probably
in the changelog).  But if there isn't anything, there isn't anything.

I might reorganize those patches as:

  1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that
  sets it, and the current patch [2/4] that uses it.

  2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel
  details.

  3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD
  details.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
  2017-08-09  2:22     ` Bjorn Helgaas
  (?)
@ 2017-08-09  3:25       ` Bjorn Helgaas
  -1 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  3:25 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom,
	patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira,
	netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit,
	robin.murphy, davem, l.stach

On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> > When bit4 is set in the PCIe Device Control register, it indicates
> > whether the device is permitted to use relaxed ordering.
> > On some platforms using relaxed ordering can have performance issues or
> > due to erratum can cause data-corruption. In such cases devices must avoid
> > using relaxed ordering.
> > 
> > This patch checks if there is any node in the hierarchy that indicates that
> > using relaxed ordering is not safe. 
...

> > +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
> 
> This is misnamed.  This doesn't tell us anything about whether the
> device *supports* relaxed ordering.  It only tells us whether the
> device is *permitted* to use it.
> 
> When a device initiates a transaction, the hardware should set the RO
> bit in the TLP with logic something like this:
> 
>   RO = <this Function supports relaxed ordering> &&
>        <this transaction doesn't require strong write ordering> &&
>        <PCI_EXP_DEVCTL_RELAX_EN is set>
> 
> The issue you're fixing is that some Completers don't handle RO
> correctly.  The determining factor is not the Requester, but the
> Completer (for this series, a Root Port).  So I think this should be
> something like:
> 
>   int pcie_relaxed_ordering_broken(struct pci_dev *completer)
>   {
>     if (!completer)
>       return 0;
> 
>     return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>   }
> 
> and the caller should do something like this:
> 
>  if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
>    adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> 
> That way it's obvious where the issue is, and it's obvious that the
> answer might be different for peer-to-peer transactions than it is for
> transactions to the root port, i.e., to coherent memory.

After looking at the driver, I wonder if it would be simpler like
this:

  int pcie_relaxed_ordering_enabled(struct pci_dev *dev)
  {
    u16 ctl;

    pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &ctl);
    return ctl & PCI_EXP_DEVCTL_RELAX_EN;
  }
  EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);

  static void pci_configure_relaxed_ordering(struct pci_dev *dev)
  {
    struct pci_dev *root;

    if (dev->is_virtfn)
      return;  /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */

    if (!pcie_relaxed_ordering_enabled(dev))
      return;

    /*
     * For now, we only deal with Relaxed Ordering issues with Root
     * Ports.  Peer-to-peer DMA is another can of worms.
     */
    root = pci_find_pcie_root_port(dev);
    if (!root)
      return;

    if (root->relaxed_ordering_broken)
      pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
                                 PCI_EXP_DEVCTL_RELAX_EN);
  }

This doesn't check every intervening switch, but I don't think we know
about any issues except with root ports.

And the driver could do:

  if (!pcie_relaxed_ordering_enabled(pdev))
    adapter->flags |= ROOT_NO_RELAXED_ORDERING;

The driver code wouldn't show anything about coherent memory vs.
peer-to-peer, but we really don't have a clue about how to handle that
yet anyway.

I guess this is back to exactly what you proposed, except that I
changed the name of pcie_relaxed_ordering_supported() to
pcie_relaxed_ordering_enabled(), which I think is slightly more
specific from the device's point of view.

Bjorn

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-09  3:25       ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  3:25 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom,
	patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira,
	netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit,
	robin.murphy, davem, l.stach

On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> > When bit4 is set in the PCIe Device Control register, it indicates
> > whether the device is permitted to use relaxed ordering.
> > On some platforms using relaxed ordering can have performance issues or
> > due to erratum can cause data-corruption. In such cases devices must avoid
> > using relaxed ordering.
> > 
> > This patch checks if there is any node in the hierarchy that indicates that
> > using relaxed ordering is not safe. 
...

> > +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
> 
> This is misnamed.  This doesn't tell us anything about whether the
> device *supports* relaxed ordering.  It only tells us whether the
> device is *permitted* to use it.
> 
> When a device initiates a transaction, the hardware should set the RO
> bit in the TLP with logic something like this:
> 
>   RO = <this Function supports relaxed ordering> &&
>        <this transaction doesn't require strong write ordering> &&
>        <PCI_EXP_DEVCTL_RELAX_EN is set>
> 
> The issue you're fixing is that some Completers don't handle RO
> correctly.  The determining factor is not the Requester, but the
> Completer (for this series, a Root Port).  So I think this should be
> something like:
> 
>   int pcie_relaxed_ordering_broken(struct pci_dev *completer)
>   {
>     if (!completer)
>       return 0;
> 
>     return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>   }
> 
> and the caller should do something like this:
> 
>  if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
>    adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> 
> That way it's obvious where the issue is, and it's obvious that the
> answer might be different for peer-to-peer transactions than it is for
> transactions to the root port, i.e., to coherent memory.

After looking at the driver, I wonder if it would be simpler like
this:

  int pcie_relaxed_ordering_enabled(struct pci_dev *dev)
  {
    u16 ctl;

    pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &ctl);
    return ctl & PCI_EXP_DEVCTL_RELAX_EN;
  }
  EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);

  static void pci_configure_relaxed_ordering(struct pci_dev *dev)
  {
    struct pci_dev *root;

    if (dev->is_virtfn)
      return;  /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */

    if (!pcie_relaxed_ordering_enabled(dev))
      return;

    /*
     * For now, we only deal with Relaxed Ordering issues with Root
     * Ports.  Peer-to-peer DMA is another can of worms.
     */
    root = pci_find_pcie_root_port(dev);
    if (!root)
      return;

    if (root->relaxed_ordering_broken)
      pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
                                 PCI_EXP_DEVCTL_RELAX_EN);
  }

This doesn't check every intervening switch, but I don't think we know
about any issues except with root ports.

And the driver could do:

  if (!pcie_relaxed_ordering_enabled(pdev))
    adapter->flags |= ROOT_NO_RELAXED_ORDERING;

The driver code wouldn't show anything about coherent memory vs.
peer-to-peer, but we really don't have a clue about how to handle that
yet anyway.

I guess this is back to exactly what you proposed, except that I
changed the name of pcie_relaxed_ordering_supported() to
pcie_relaxed_ordering_enabled(), which I think is slightly more
specific from the device's point of view.

Bjorn

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-09  3:25       ` Bjorn Helgaas
  0 siblings, 0 replies; 70+ messages in thread
From: Bjorn Helgaas @ 2017-08-09  3:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> > When bit4 is set in the PCIe Device Control register, it indicates
> > whether the device is permitted to use relaxed ordering.
> > On some platforms using relaxed ordering can have performance issues or
> > due to erratum can cause data-corruption. In such cases devices must avoid
> > using relaxed ordering.
> > 
> > This patch checks if there is any node in the hierarchy that indicates that
> > using relaxed ordering is not safe. 
...

> > +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
> 
> This is misnamed.  This doesn't tell us anything about whether the
> device *supports* relaxed ordering.  It only tells us whether the
> device is *permitted* to use it.
> 
> When a device initiates a transaction, the hardware should set the RO
> bit in the TLP with logic something like this:
> 
>   RO = <this Function supports relaxed ordering> &&
>        <this transaction doesn't require strong write ordering> &&
>        <PCI_EXP_DEVCTL_RELAX_EN is set>
> 
> The issue you're fixing is that some Completers don't handle RO
> correctly.  The determining factor is not the Requester, but the
> Completer (for this series, a Root Port).  So I think this should be
> something like:
> 
>   int pcie_relaxed_ordering_broken(struct pci_dev *completer)
>   {
>     if (!completer)
>       return 0;
> 
>     return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>   }
> 
> and the caller should do something like this:
> 
>  if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
>    adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> 
> That way it's obvious where the issue is, and it's obvious that the
> answer might be different for peer-to-peer transactions than it is for
> transactions to the root port, i.e., to coherent memory.

After looking at the driver, I wonder if it would be simpler like
this:

  int pcie_relaxed_ordering_enabled(struct pci_dev *dev)
  {
    u16 ctl;

    pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &ctl);
    return ctl & PCI_EXP_DEVCTL_RELAX_EN;
  }
  EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);

  static void pci_configure_relaxed_ordering(struct pci_dev *dev)
  {
    struct pci_dev *root;

    if (dev->is_virtfn)
      return;  /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */

    if (!pcie_relaxed_ordering_enabled(dev))
      return;

    /*
     * For now, we only deal with Relaxed Ordering issues with Root
     * Ports.  Peer-to-peer DMA is another can of worms.
     */
    root = pci_find_pcie_root_port(dev);
    if (!root)
      return;

    if (root->relaxed_ordering_broken)
      pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
                                 PCI_EXP_DEVCTL_RELAX_EN);
  }

This doesn't check every intervening switch, but I don't think we know
about any issues except with root ports.

And the driver could do:

  if (!pcie_relaxed_ordering_enabled(pdev))
    adapter->flags |= ROOT_NO_RELAXED_ORDERING;

The driver code wouldn't show anything about coherent memory vs.
peer-to-peer, but we really don't have a clue about how to handle that
yet anyway.

I guess this is back to exactly what you proposed, except that I
changed the name of pcie_relaxed_ordering_supported() to
pcie_relaxed_ordering_enabled(), which I think is slightly more
specific from the device's point of view.

Bjorn

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-09  3:02         ` Bjorn Helgaas
  (?)
  (?)
@ 2017-08-09 12:17           ` Ding Tianhong
  -1 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-09 12:17 UTC (permalink / raw)
  To: Bjorn Helgaas, Casey Leedom
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm



On 2017/8/9 11:02, Bjorn Helgaas wrote:
> On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
>> | From: Bjorn Helgaas <helgaas@kernel.org>
>> | Sent: Tuesday, August 8, 2017 4:22 PM
>> | 
>> | This needs to include a link to the Intel spec
>> | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
>> | sec 3.9.1).
>>
>>   In the commit message or as a comment?  Regardless, I agree.  It's always
>> nice to be able to go back and see what the official documentation says.
>> However, that said, links on the internet are ... fragile as time goes by,
>> so we might want to simply quote section 3.9.1 in the commit message since
>> it's relatively short:
>>
>>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>>           and Toward MMIO Regions (P2P)
>>
>>     In order to maximize performance for PCIe devices in the processors
>>     listed in Table 3-6 below, the soft- ware should determine whether the
>>     accesses are toward coherent memory (system memory) or toward MMIO
>>     regions (P2P access to other devices). If the access is toward MMIO
>>     region, then software can command HW to set the RO bit in the TLP
>>     header, as this would allow hardware to achieve maximum throughput for
>>     these types of accesses. For accesses toward coherent memory, software
>>     can command HW to clear the RO bit in the TLP header (no RO), as this
>>     would allow hardware to achieve maximum throughput for these types of
>>     accesses.
>>
>>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>>                PCIe Performance
>>
>>     Processor                            CPU RP Device IDs
>>
>>     Intel Xeon processors based on       6F01H-6F0EH
>>     Broadwell microarchitecture
>>
>>     Intel Xeon processors based on       2F01H-2F0EH
>>     Haswell microarchitecture
> 
> Agreed, links are prone to being broken.  I would include in the
> changelog the complete title and order number, along with the link as
> a footnote.  Wouldn't hurt to quote the section too, since it's short.
> 

OK

>> | It should also include a pointer to the AMD erratum, if available, or
>> | at least some reference to how we know it doesn't obey the rules.
>>
>>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
>> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
>> messages almost a year ago saying that all of AMD's energies were being
>> redirected towards upcoming x86 products (likely Ryzen as we now know).  As
>> far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
>> SoC.
>>
>>   On the specific issue, I can certainly write up somthing even more
>> extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
>> review the comment I wrote up and tell me if you'd like something even more
>> detailed -- I'm usually acused of writing comments which are too long, so
>> this would be a new one on me ... :-)
> 
> If you have any bug reports with info about how you debugged it and
> concluded that Seattle is broken, you could include a link (probably
> in the changelog).  But if there isn't anything, there isn't anything.
> 
> I might reorganize those patches as:
> 
>   1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that
>   sets it, and the current patch [2/4] that uses it.
> 
>   2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel
>   details.
> 
>   3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD
>   details.
> 

OK, I could reorganize it, but still need the Casey to give me the link
for the Seattle, otherwise I could remove the AMD part and wait until
someone show it. Thanks

Ding
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 12:17           ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-09 12:17 UTC (permalink / raw)
  To: Bjorn Helgaas, Casey Leedom
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland@arm.com



On 2017/8/9 11:02, Bjorn Helgaas wrote:
> On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
>> | From: Bjorn Helgaas <helgaas@kernel.org>
>> | Sent: Tuesday, August 8, 2017 4:22 PM
>> | 
>> | This needs to include a link to the Intel spec
>> | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
>> | sec 3.9.1).
>>
>>   In the commit message or as a comment?  Regardless, I agree.  It's always
>> nice to be able to go back and see what the official documentation says.
>> However, that said, links on the internet are ... fragile as time goes by,
>> so we might want to simply quote section 3.9.1 in the commit message since
>> it's relatively short:
>>
>>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>>           and Toward MMIO Regions (P2P)
>>
>>     In order to maximize performance for PCIe devices in the processors
>>     listed in Table 3-6 below, the soft- ware should determine whether the
>>     accesses are toward coherent memory (system memory) or toward MMIO
>>     regions (P2P access to other devices). If the access is toward MMIO
>>     region, then software can command HW to set the RO bit in the TLP
>>     header, as this would allow hardware to achieve maximum throughput for
>>     these types of accesses. For accesses toward coherent memory, software
>>     can command HW to clear the RO bit in the TLP header (no RO), as this
>>     would allow hardware to achieve maximum throughput for these types of
>>     accesses.
>>
>>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>>                PCIe Performance
>>
>>     Processor                            CPU RP Device IDs
>>
>>     Intel Xeon processors based on       6F01H-6F0EH
>>     Broadwell microarchitecture
>>
>>     Intel Xeon processors based on       2F01H-2F0EH
>>     Haswell microarchitecture
> 
> Agreed, links are prone to being broken.  I would include in the
> changelog the complete title and order number, along with the link as
> a footnote.  Wouldn't hurt to quote the section too, since it's short.
> 

OK

>> | It should also include a pointer to the AMD erratum, if available, or
>> | at least some reference to how we know it doesn't obey the rules.
>>
>>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
>> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
>> messages almost a year ago saying that all of AMD's energies were being
>> redirected towards upcoming x86 products (likely Ryzen as we now know).  As
>> far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
>> SoC.
>>
>>   On the specific issue, I can certainly write up somthing even more
>> extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
>> review the comment I wrote up and tell me if you'd like something even more
>> detailed -- I'm usually acused of writing comments which are too long, so
>> this would be a new one on me ... :-)
> 
> If you have any bug reports with info about how you debugged it and
> concluded that Seattle is broken, you could include a link (probably
> in the changelog).  But if there isn't anything, there isn't anything.
> 
> I might reorganize those patches as:
> 
>   1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that
>   sets it, and the current patch [2/4] that uses it.
> 
>   2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel
>   details.
> 
>   3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD
>   details.
> 

OK, I could reorganize it, but still need the Casey to give me the link
for the Seattle, otherwise I could remove the AMD part and wait until
someone show it. Thanks

Ding
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 12:17           ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-09 12:17 UTC (permalink / raw)
  To: Bjorn Helgaas, Casey Leedom
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm



On 2017/8/9 11:02, Bjorn Helgaas wrote:
> On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
>> | From: Bjorn Helgaas <helgaas@kernel.org>
>> | Sent: Tuesday, August 8, 2017 4:22 PM
>> | 
>> | This needs to include a link to the Intel spec
>> | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
>> | sec 3.9.1).
>>
>>   In the commit message or as a comment?  Regardless, I agree.  It's always
>> nice to be able to go back and see what the official documentation says.
>> However, that said, links on the internet are ... fragile as time goes by,
>> so we might want to simply quote section 3.9.1 in the commit message since
>> it's relatively short:
>>
>>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>>           and Toward MMIO Regions (P2P)
>>
>>     In order to maximize performance for PCIe devices in the processors
>>     listed in Table 3-6 below, the soft- ware should determine whether the
>>     accesses are toward coherent memory (system memory) or toward MMIO
>>     regions (P2P access to other devices). If the access is toward MMIO
>>     region, then software can command HW to set the RO bit in the TLP
>>     header, as this would allow hardware to achieve maximum throughput for
>>     these types of accesses. For accesses toward coherent memory, software
>>     can command HW to clear the RO bit in the TLP header (no RO), as this
>>     would allow hardware to achieve maximum throughput for these types of
>>     accesses.
>>
>>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>>                PCIe Performance
>>
>>     Processor                            CPU RP Device IDs
>>
>>     Intel Xeon processors based on       6F01H-6F0EH
>>     Broadwell microarchitecture
>>
>>     Intel Xeon processors based on       2F01H-2F0EH
>>     Haswell microarchitecture
> 
> Agreed, links are prone to being broken.  I would include in the
> changelog the complete title and order number, along with the link as
> a footnote.  Wouldn't hurt to quote the section too, since it's short.
> 

OK

>> | It should also include a pointer to the AMD erratum, if available, or
>> | at least some reference to how we know it doesn't obey the rules.
>>
>>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
>> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
>> messages almost a year ago saying that all of AMD's energies were being
>> redirected towards upcoming x86 products (likely Ryzen as we now know).  As
>> far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
>> SoC.
>>
>>   On the specific issue, I can certainly write up somthing even more
>> extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
>> review the comment I wrote up and tell me if you'd like something even more
>> detailed -- I'm usually acused of writing comments which are too long, so
>> this would be a new one on me ... :-)
> 
> If you have any bug reports with info about how you debugged it and
> concluded that Seattle is broken, you could include a link (probably
> in the changelog).  But if there isn't anything, there isn't anything.
> 
> I might reorganize those patches as:
> 
>   1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that
>   sets it, and the current patch [2/4] that uses it.
> 
>   2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel
>   details.
> 
>   3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD
>   details.
> 

OK, I could reorganize it, but still need the Casey to give me the link
for the Seattle, otherwise I could remove the AMD part and wait until
someone show it. Thanks

Ding
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 12:17           ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-09 12:17 UTC (permalink / raw)
  To: linux-arm-kernel



On 2017/8/9 11:02, Bjorn Helgaas wrote:
> On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
>> | From: Bjorn Helgaas <helgaas@kernel.org>
>> | Sent: Tuesday, August 8, 2017 4:22 PM
>> | 
>> | This needs to include a link to the Intel spec
>> | (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
>> | sec 3.9.1).
>>
>>   In the commit message or as a comment?  Regardless, I agree.  It's always
>> nice to be able to go back and see what the official documentation says.
>> However, that said, links on the internet are ... fragile as time goes by,
>> so we might want to simply quote section 3.9.1 in the commit message since
>> it's relatively short:
>>
>>     3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>>           and Toward MMIO Regions (P2P)
>>
>>     In order to maximize performance for PCIe devices in the processors
>>     listed in Table 3-6 below, the soft- ware should determine whether the
>>     accesses are toward coherent memory (system memory) or toward MMIO
>>     regions (P2P access to other devices). If the access is toward MMIO
>>     region, then software can command HW to set the RO bit in the TLP
>>     header, as this would allow hardware to achieve maximum throughput for
>>     these types of accesses. For accesses toward coherent memory, software
>>     can command HW to clear the RO bit in the TLP header (no RO), as this
>>     would allow hardware to achieve maximum throughput for these types of
>>     accesses.
>>
>>     Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>>                PCIe Performance
>>
>>     Processor                            CPU RP Device IDs
>>
>>     Intel Xeon processors based on       6F01H-6F0EH
>>     Broadwell microarchitecture
>>
>>     Intel Xeon processors based on       2F01H-2F0EH
>>     Haswell microarchitecture
> 
> Agreed, links are prone to being broken.  I would include in the
> changelog the complete title and order number, along with the link as
> a footnote.  Wouldn't hurt to quote the section too, since it's short.
> 

OK

>> | It should also include a pointer to the AMD erratum, if available, or
>> | at least some reference to how we know it doesn't obey the rules.
>>
>>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
>> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
>> messages almost a year ago saying that all of AMD's energies were being
>> redirected towards upcoming x86 products (likely Ryzen as we now know).  As
>> far as I can tell AMD has walked away from their A1100 (AKA "Seattle") ARM
>> SoC.
>>
>>   On the specific issue, I can certainly write up somthing even more
>> extensive than I wrote up for the comment in drivers/pci/quirks.c.  Please
>> review the comment I wrote up and tell me if you'd like something even more
>> detailed -- I'm usually acused of writing comments which are too long, so
>> this would be a new one on me ... :-)
> 
> If you have any bug reports with info about how you debugged it and
> concluded that Seattle is broken, you could include a link (probably
> in the changelog).  But if there isn't anything, there isn't anything.
> 
> I might reorganize those patches as:
> 
>   1) Add a PCI_DEV_FLAGS_RELAXED_ORDERING_BROKEN flag, the quirk that
>   sets it, and the current patch [2/4] that uses it.
> 
>   2) Add the Intel DECLARE_PCI_FIXUP_CLASS_EARLY()s with the Intel
>   details.
> 
>   3) Add the AMD DECLARE_PCI_FIXUP_CLASS_EARLY()s with the AMD
>   details.
> 

OK, I could reorganize it, but still need the Casey to give me the link
for the Seattle, otherwise I could remove the AMD part and wait until
someone show it. Thanks

Ding
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
  2017-08-09  2:22     ` Bjorn Helgaas
  (?)
  (?)
@ 2017-08-09 12:33       ` Casey Leedom
  -1 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 12:33 UTC (permalink / raw)
  To: Bjorn Helgaas, Ding Tianhong
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

| From: Bjorn Helgaas <helgaas@kernel.org>
| Sent: Tuesday, August 8, 2017 7:22 PM
| ...
| and the caller should do something like this:
| 
|     if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
|         adapter->flags |= ROOT_NO_RELAXED_ORDERING;
| 
| That way it's obvious where the issue is, and it's obvious that the
| answer might be different for peer-to-peer transactions than it is for
| transactions to the root port, i.e., to coherent memory.
| ...

Which is back to something very close to what I initially suggested in my
first patch submission.  Because you're right, this isn't about broken
Source Devices, it's about broken Completing Devices.

Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be
able to see our Root Port in order to make this determination.  (And in some
Hypervisor implementations I've seen, there's not even a synthetic Root Port
available to the VM at all, let alone read-only access to the real one.)

So the current scheme was developed of having the Hypervisor kernel traverse
down the PCIe Fabric when it finds a broken Root Port implementation (the
issue that we've mostly been primarily focused upon), and turning off the
PCIe Capability Device Control[Relaxed Ordering Enable].  This was to serve
two purposes:

 1. Turn that off in order to prevent sending any Transaction Layer
    Packets with the Relaxed Ordering Attribute to any Completer.
    Which unfortunately would also prevent Peer-to-Peer use of the
    Relaxed Ordering Attribute.

 2. Act as a message to Device Drivers for those downstream devices
    that the they were dealing with a broken Root Port implementation.
    And this would work even for a driver in a VM with an attached
    device since it would be able to see the PCIe Configuration Space
    for the attached device.

I haven't been excited about any of this because:

 A. While so far all of the examples we've talked about are broken
    Root Port Completers, it's perfectly possible that other devices
    could be broken -- say an NVMe Device which is not "Coherent
    Memory".  How would this fit into the framework APIs being
    described?

 B. I have yet to see a n example of how the currently proposed
    API framework would be used in a hybrid environment where
    TLPs to the Root Port would not use Relaxed Ordering, but TLPs
    to a Peer would use Relaxed Ordering.  So far its all been about
    using a "big hammer" to completely disable the use of Relaxed
    Ordering.

But the VM problem keeps cropping up over and over.  A driver in a VM
doesn't have access to the Root Port to determine if its on a "Black List"
and our only way of communicating with the driver in the VM is to leave the
device in a particular state (i.e. initialize the PCIe Capability Device
Control[Relaxed Ordering Enable] to "off").

Oh, and also, on the current patch submission's focus on broken Root Port
implementations: one could suggest that even if we're stuck with the "Device
attached to a VM Conundrum", that what we should really be thinking about is
if ANY device within a PCIe Fabric has broken Relaxed Ordering completion
problems, and, if so, "poisoning" the entire containing PCIe Fabric by
turning off Relaxed Ordering Enable for every device, up, down sideways --
including the Root Port itself.

| ...
| This associates the message with the Requester that may potentially
| use relaxed ordering. But there's nothing wrong or unusual about the
| Requester; the issue is with the *Completer*, so I think the message
| should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
| Maybe it should be both places; I dunno.
| 
| This implementation assumes the device only initiates transactions to
| coherent memory, i.e., it assumes the device never does peer-to-peer
| DMA.  I guess we'll have to wait and see if we trip over any
| peer-to-peer issues, then figure out how to handle them.
| ...

Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I
mentioned above.  And that the Intel document mentions itself.

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-09 12:33       ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 12:33 UTC (permalink / raw)
  To: Bjorn Helgaas, Ding Tianhong
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland@arm.com

| From: Bjorn Helgaas <helgaas@kernel.org>
| Sent: Tuesday, August 8, 2017 7:22 PM
| ...
| and the caller should do something like this:
| 
|     if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
|         adapter->flags |= ROOT_NO_RELAXED_ORDERING;
| 
| That way it's obvious where the issue is, and it's obvious that the
| answer might be different for peer-to-peer transactions than it is for
| transactions to the root port, i.e., to coherent memory.
| ...

Which is back to something very close to what I initially suggested in my
first patch submission.  Because you're right, this isn't about broken
Source Devices, it's about broken Completing Devices.

Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be
able to see our Root Port in order to make this determination.  (And in some
Hypervisor implementations I've seen, there's not even a synthetic Root Port
available to the VM at all, let alone read-only access to the real one.)

So the current scheme was developed of having the Hypervisor kernel traverse
down the PCIe Fabric when it finds a broken Root Port implementation (the
issue that we've mostly been primarily focused upon), and turning off the
PCIe Capability Device Control[Relaxed Ordering Enable].  This was to serve
two purposes:

 1. Turn that off in order to prevent sending any Transaction Layer
    Packets with the Relaxed Ordering Attribute to any Completer.
    Which unfortunately would also prevent Peer-to-Peer use of the
    Relaxed Ordering Attribute.

 2. Act as a message to Device Drivers for those downstream devices
    that the they were dealing with a broken Root Port implementation.
    And this would work even for a driver in a VM with an attached
    device since it would be able to see the PCIe Configuration Space
    for the attached device.

I haven't been excited about any of this because:

 A. While so far all of the examples we've talked about are broken
    Root Port Completers, it's perfectly possible that other devices
    could be broken -- say an NVMe Device which is not "Coherent
    Memory".  How would this fit into the framework APIs being
    described?

 B. I have yet to see a n example of how the currently proposed
    API framework would be used in a hybrid environment where
    TLPs to the Root Port would not use Relaxed Ordering, but TLPs
    to a Peer would use Relaxed Ordering.  So far its all been about
    using a "big hammer" to completely disable the use of Relaxed
    Ordering.

But the VM problem keeps cropping up over and over.  A driver in a VM
doesn't have access to the Root Port to determine if its on a "Black List"
and our only way of communicating with the driver in the VM is to leave the
device in a particular state (i.e. initialize the PCIe Capability Device
Control[Relaxed Ordering Enable] to "off").

Oh, and also, on the current patch submission's focus on broken Root Port
implementations: one could suggest that even if we're stuck with the "Device
attached to a VM Conundrum", that what we should really be thinking about is
if ANY device within a PCIe Fabric has broken Relaxed Ordering completion
problems, and, if so, "poisoning" the entire containing PCIe Fabric by
turning off Relaxed Ordering Enable for every device, up, down sideways --
including the Root Port itself.

| ...
| This associates the message with the Requester that may potentially
| use relaxed ordering. But there's nothing wrong or unusual about the
| Requester; the issue is with the *Completer*, so I think the message
| should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
| Maybe it should be both places; I dunno.
| 
| This implementation assumes the device only initiates transactions to
| coherent memory, i.e., it assumes the device never does peer-to-peer
| DMA.  I guess we'll have to wait and see if we trip over any
| peer-to-peer issues, then figure out how to handle them.
| ...

Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I
mentioned above.  And that the Intel document mentions itself.

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-09 12:33       ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 12:33 UTC (permalink / raw)
  To: Bjorn Helgaas, Ding Tianhong
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

| From: Bjorn Helgaas <helgaas@kernel.org>
| Sent: Tuesday, August 8, 2017 7:22 PM
| ...
| and the caller should do something like this:
|=20
|     if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
|         adapter->flags |=3D ROOT_NO_RELAXED_ORDERING;
|=20
| That way it's obvious where the issue is, and it's obvious that the
| answer might be different for peer-to-peer transactions than it is for
| transactions to the root port, i.e., to coherent memory.
| ...

Which is back to something very close to what I initially suggested in my
first patch submission.  Because you're right, this isn't about broken
Source Devices, it's about broken Completing Devices.

Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be
able to see our Root Port in order to make this determination.  (And in som=
e
Hypervisor implementations I've seen, there's not even a synthetic Root Por=
t
available to the VM at all, let alone read-only access to the real one.)

So the current scheme was developed of having the Hypervisor kernel travers=
e
down the PCIe Fabric when it finds a broken Root Port implementation (the
issue that we've mostly been primarily focused upon), and turning off the
PCIe Capability Device Control[Relaxed Ordering Enable].  This was to serve
two purposes:

 1. Turn that off in order to prevent sending any Transaction Layer
    Packets with the Relaxed Ordering Attribute to any Completer.
    Which unfortunately would also prevent Peer-to-Peer use of the
    Relaxed Ordering Attribute.

 2. Act as a message to Device Drivers for those downstream devices
    that the they were dealing with a broken Root Port implementation.
    And this would work even for a driver in a VM with an attached
    device since it would be able to see the PCIe Configuration Space
    for the attached device.

I haven't been excited about any of this because:

 A. While so far all of the examples we've talked about are broken
    Root Port Completers, it's perfectly possible that other devices
    could be broken -- say an NVMe Device which is not "Coherent
    Memory".  How would this fit into the framework APIs being
    described?

 B. I have yet to see a n example of how the currently proposed
    API framework would be used in a hybrid environment where
    TLPs to the Root Port would not use Relaxed Ordering, but TLPs
    to a Peer would use Relaxed Ordering.  So far its all been about
    using a "big hammer" to completely disable the use of Relaxed
    Ordering.

But the VM problem keeps cropping up over and over.  A driver in a VM
doesn't have access to the Root Port to determine if its on a "Black List"
and our only way of communicating with the driver in the VM is to leave the
device in a particular state (i.e. initialize the PCIe Capability Device
Control[Relaxed Ordering Enable] to "off").

Oh, and also, on the current patch submission's focus on broken Root Port
implementations: one could suggest that even if we're stuck with the "Devic=
e
attached to a VM Conundrum", that what we should really be thinking about i=
s
if ANY device within a PCIe Fabric has broken Relaxed Ordering completion
problems, and, if so, "poisoning" the entire containing PCIe Fabric by
turning off Relaxed Ordering Enable for every device, up, down sideways --
including the Root Port itself.

| ...
| This associates the message with the Requester that may potentially
| use relaxed ordering. But there's nothing wrong or unusual about the
| Requester; the issue is with the *Completer*, so I think the message
| should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
| Maybe it should be both places; I dunno.
|=20
| This implementation assumes the device only initiates transactions to
| coherent memory, i.e., it assumes the device never does peer-to-peer
| DMA.  I guess we'll have to wait and see if we trip over any
| peer-to-peer issues, then figure out how to handle them.
| ...

Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I
mentioned above.  And that the Intel document mentions itself.

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-09 12:33       ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

| From: Bjorn Helgaas <helgaas@kernel.org>
| Sent: Tuesday, August 8, 2017 7:22 PM
| ...
| and the caller should do something like this:
| 
|     if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
|         adapter->flags |= ROOT_NO_RELAXED_ORDERING;
| 
| That way it's obvious where the issue is, and it's obvious that the
| answer might be different for peer-to-peer transactions than it is for
| transactions to the root port, i.e., to coherent memory.
| ...

Which is back to something very close to what I initially suggested in my
first patch submission.  Because you're right, this isn't about broken
Source Devices, it's about broken Completing Devices.

Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be
able to see our Root Port in order to make this determination.  (And in some
Hypervisor implementations I've seen, there's not even a synthetic Root Port
available to the VM at all, let alone read-only access to the real one.)

So the current scheme was developed of having the Hypervisor kernel traverse
down the PCIe Fabric when it finds a broken Root Port implementation (the
issue that we've mostly been primarily focused upon), and turning off the
PCIe Capability Device Control[Relaxed Ordering Enable].  This was to serve
two purposes:

 1. Turn that off in order to prevent sending any Transaction Layer
    Packets with the Relaxed Ordering Attribute to any Completer.
    Which unfortunately would also prevent Peer-to-Peer use of the
    Relaxed Ordering Attribute.

 2. Act as a message to Device Drivers for those downstream devices
    that the they were dealing with a broken Root Port implementation.
    And this would work even for a driver in a VM with an attached
    device since it would be able to see the PCIe Configuration Space
    for the attached device.

I haven't been excited about any of this because:

 A. While so far all of the examples we've talked about are broken
    Root Port Completers, it's perfectly possible that other devices
    could be broken -- say an NVMe Device which is not "Coherent
    Memory".  How would this fit into the framework APIs being
    described?

 B. I have yet to see a n example of how the currently proposed
    API framework would be used in a hybrid environment where
    TLPs to the Root Port would not use Relaxed Ordering, but TLPs
    to a Peer would use Relaxed Ordering.  So far its all been about
    using a "big hammer" to completely disable the use of Relaxed
    Ordering.

But the VM problem keeps cropping up over and over.  A driver in a VM
doesn't have access to the Root Port to determine if its on a "Black List"
and our only way of communicating with the driver in the VM is to leave the
device in a particular state (i.e. initialize the PCIe Capability Device
Control[Relaxed Ordering Enable] to "off").

Oh, and also, on the current patch submission's focus on broken Root Port
implementations: one could suggest that even if we're stuck with the "Device
attached to a VM Conundrum", that what we should really be thinking about is
if ANY device within a PCIe Fabric has broken Relaxed Ordering completion
problems, and, if so, "poisoning" the entire containing PCIe Fabric by
turning off Relaxed Ordering Enable for every device, up, down sideways --
including the Root Port itself.

| ...
| This associates the message with the Requester that may potentially
| use relaxed ordering. But there's nothing wrong or unusual about the
| Requester; the issue is with the *Completer*, so I think the message
| should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
| Maybe it should be both places; I dunno.
| 
| This implementation assumes the device only initiates transactions to
| coherent memory, i.e., it assumes the device never does peer-to-peer
| DMA.  I guess we'll have to wait and see if we trip over any
| peer-to-peer issues, then figure out how to handle them.
| ...

Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I
mentioned above.  And that the Intel document mentions itself.

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
  2017-08-09  2:22     ` Bjorn Helgaas
@ 2017-08-09 13:23       ` Ding Tianhong
  -1 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-09 13:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: leedom, ashok.raj, bhelgaas, werner, ganeshgr, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

Hi Bjorn:

On 2017/8/9 10:22, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
>> When bit4 is set in the PCIe Device Control register, it indicates
>> whether the device is permitted to use relaxed ordering.
>> On some platforms using relaxed ordering can have performance issues or
>> due to erratum can cause data-corruption. In such cases devices must avoid
>> using relaxed ordering.
>>
>> This patch checks if there is any node in the hierarchy that indicates that
>> using relaxed ordering is not safe. 
> 
> I think you only check the devices between the root port and the
> target device.  For example, you don't check siblings or cousins of
> the target device.
> 

OK, update the description.

>> In such cases the patch turns off the
>> relaxed ordering by clearing the eapability for this device.
> 
> s/eapability/capability/
> 
>> And if the
>> device is probably running in a guest machine, we should do nothing.
> 
> I don't know what this sentence means.  "Probably running in a guest
> machine" doesn't really make sense, and there's nothing in your patch
> that explicitly checks for being in a guest machine.
> 

Alex noticed that we should do nothing if in the virtual machine because
the Root Complex is NULL at that time, so I think this word should be more
clearly here.

>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> Acked-by: Ashok Raj <ashok.raj@intel.com>
>> ---
>>  drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
>>  drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
>>  include/linux/pci.h |  2 ++
>>  3 files changed, 68 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index af0cc34..4f9d7c1 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>>  EXPORT_SYMBOL(pcie_set_mps);
>>  
>>  /**
>> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * If possible clear relaxed ordering
> 
> Why "If possible"?  The bit is required to be RW or hardwired to zero,
> so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.
> 

OK

>> + */
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
>> +{
...
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-09 13:23       ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-09 13:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Bjorn:

On 2017/8/9 10:22, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
>> When bit4 is set in the PCIe Device Control register, it indicates
>> whether the device is permitted to use relaxed ordering.
>> On some platforms using relaxed ordering can have performance issues or
>> due to erratum can cause data-corruption. In such cases devices must avoid
>> using relaxed ordering.
>>
>> This patch checks if there is any node in the hierarchy that indicates that
>> using relaxed ordering is not safe. 
> 
> I think you only check the devices between the root port and the
> target device.  For example, you don't check siblings or cousins of
> the target device.
> 

OK, update the description.

>> In such cases the patch turns off the
>> relaxed ordering by clearing the eapability for this device.
> 
> s/eapability/capability/
> 
>> And if the
>> device is probably running in a guest machine, we should do nothing.
> 
> I don't know what this sentence means.  "Probably running in a guest
> machine" doesn't really make sense, and there's nothing in your patch
> that explicitly checks for being in a guest machine.
> 

Alex noticed that we should do nothing if in the virtual machine because
the Root Complex is NULL at that time, so I think this word should be more
clearly here.

>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> Acked-by: Ashok Raj <ashok.raj@intel.com>
>> ---
>>  drivers/pci/pci.c   | 29 +++++++++++++++++++++++++++++
>>  drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++
>>  include/linux/pci.h |  2 ++
>>  3 files changed, 68 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index af0cc34..4f9d7c1 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>>  EXPORT_SYMBOL(pcie_set_mps);
>>  
>>  /**
>> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * If possible clear relaxed ordering
> 
> Why "If possible"?  The bit is required to be RW or hardwired to zero,
> so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.
> 

OK

>> + */
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
>> +{
...
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
  2017-08-09  3:25       ` Bjorn Helgaas
@ 2017-08-09 13:42         ` Ding Tianhong
  -1 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-09 13:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom,
	patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira,
	netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit,
	robin.murphy, davem, l.stach

On 2017/8/9 11:25, Bjorn Helgaas wrote:
> On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote:
>> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
>>> When bit4 is set in the PCIe Device Control register, it indicates

> After looking at the driver, I wonder if it would be simpler like
> this:
> 
>   int pcie_relaxed_ordering_enabled(struct pci_dev *dev)
>   {
>     u16 ctl;
> 
>     pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &ctl);
>     return ctl & PCI_EXP_DEVCTL_RELAX_EN;
>   }
>   EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);
> 
>   static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>   {
>     struct pci_dev *root;
> 
>     if (dev->is_virtfn)
>       return;  /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */
> 
>     if (!pcie_relaxed_ordering_enabled(dev))
>       return;
> 
>     /*
>      * For now, we only deal with Relaxed Ordering issues with Root
>      * Ports.  Peer-to-peer DMA is another can of worms.
>      */
>     root = pci_find_pcie_root_port(dev);
>     if (!root)
>       return;
> 
>     if (root->relaxed_ordering_broken)
>       pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>                                  PCI_EXP_DEVCTL_RELAX_EN);
>   }
> 
> This doesn't check every intervening switch, but I don't think we know
> about any issues except with root ports.
> 

Yes

> And the driver could do:
> 
>   if (!pcie_relaxed_ordering_enabled(pdev))
>     adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> 
> The driver code wouldn't show anything about coherent memory vs.
> peer-to-peer, but we really don't have a clue about how to handle that
> yet anyway.
> 
> I guess this is back to exactly what you proposed, except that I
> changed the name of pcie_relaxed_ordering_supported() to
> pcie_relaxed_ordering_enabled(), which I think is slightly more
> specific from the device's point of view.
> 

OK, looks like we reach a consensus finally, I will follow your new opinion and resend, thanks.

Ding

> Bjorn
> 
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
@ 2017-08-09 13:42         ` Ding Tianhong
  0 siblings, 0 replies; 70+ messages in thread
From: Ding Tianhong @ 2017-08-09 13:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 2017/8/9 11:25, Bjorn Helgaas wrote:
> On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote:
>> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
>>> When bit4 is set in the PCIe Device Control register, it indicates

> After looking at the driver, I wonder if it would be simpler like
> this:
> 
>   int pcie_relaxed_ordering_enabled(struct pci_dev *dev)
>   {
>     u16 ctl;
> 
>     pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &ctl);
>     return ctl & PCI_EXP_DEVCTL_RELAX_EN;
>   }
>   EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);
> 
>   static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>   {
>     struct pci_dev *root;
> 
>     if (dev->is_virtfn)
>       return;  /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */
> 
>     if (!pcie_relaxed_ordering_enabled(dev))
>       return;
> 
>     /*
>      * For now, we only deal with Relaxed Ordering issues with Root
>      * Ports.  Peer-to-peer DMA is another can of worms.
>      */
>     root = pci_find_pcie_root_port(dev);
>     if (!root)
>       return;
> 
>     if (root->relaxed_ordering_broken)
>       pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>                                  PCI_EXP_DEVCTL_RELAX_EN);
>   }
> 
> This doesn't check every intervening switch, but I don't think we know
> about any issues except with root ports.
> 

Yes

> And the driver could do:
> 
>   if (!pcie_relaxed_ordering_enabled(pdev))
>     adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> 
> The driver code wouldn't show anything about coherent memory vs.
> peer-to-peer, but we really don't have a clue about how to handle that
> yet anyway.
> 
> I guess this is back to exactly what you proposed, except that I
> changed the name of pcie_relaxed_ordering_supported() to
> pcie_relaxed_ordering_enabled(), which I think is slightly more
> specific from the device's point of view.
> 

OK, looks like we reach a consensus finally, I will follow your new opinion and resend, thanks.

Ding

> Bjorn
> 
> .
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-08 23:22     ` Bjorn Helgaas
@ 2017-08-09 15:58       ` Raj, Ashok
  -1 siblings, 0 replies; 70+ messages in thread
From: Raj, Ashok @ 2017-08-09 15:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ding Tianhong, leedom, bhelgaas, werner, ganeshgr,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm, ashok.raj

Hi Bjorn

On Tue, Aug 08, 2017 at 06:22:00PM -0500, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:10PM +0800, Ding Tianhong wrote:
> > From: Casey Leedom <leedom@chelsio.com>
> > 
> > Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to
> > data-corruption.
> 
> This needs to include a link to the Intel spec
> (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
> sec 3.9.1).
> 
> It should also include a pointer to the AMD erratum, if available, or
> at least some reference to how we know it doesn't obey the rules.
> 
> Ashok, thanks for chiming in.  Now that you have, I have a few more
> questions for you:
> 
>   - Is the above doc the one you mentioned as being now public?

Yes. 
>   
>   - Is this considered a hardware erratum?

I would think so. I have tried to pursue the publication in that direction
but it morphed into the optimization guide instead. Once it got into some
open doc i stopped pushing.. but will continue to get this into erratum. i do
agree that's the right place holder for this.

>   
>   - If so, is there a pointer to that as well?
>   
>   - If this is not considered an erratum, can you provide any guidance
>     about how an OS should determine when it should use RO?

The optimization guide states that it only applies to transactions targetting
system memory. For peer-2-peer RO is allowed and has perf upside.

As Casey pointed out in an earlier thread, we choose the heavy hammer approach
because there are some that can lead to data-corruption as opposed to perf
degradation. 

This looks ugly, but maybe we can have 2 flags. one that indicates its a strict
no-no, and one that says no to system memory only. That way driver can
determine when the device would turn the hint on in the TLP.

>     
> Relying on a list of device IDs in an optimization manual is OK for an
> erratum, but if it's *not* an erratum, it seems like a hole in the

Good point.. for this specific case its really an erratum, but for some
reason they made the decision to use this doc vs. the generic errata
data-sheet that would have been the preferred way to document.

> specs because as far as I know there's no generic way for the OS to
> discover whether to use RO.
> 

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 15:58       ` Raj, Ashok
  0 siblings, 0 replies; 70+ messages in thread
From: Raj, Ashok @ 2017-08-09 15:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Bjorn

On Tue, Aug 08, 2017 at 06:22:00PM -0500, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:10PM +0800, Ding Tianhong wrote:
> > From: Casey Leedom <leedom@chelsio.com>
> > 
> > Root complexes don't obey PCIe 3.0 ordering rules, hence could lead to
> > data-corruption.
> 
> This needs to include a link to the Intel spec
> (https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf,
> sec 3.9.1).
> 
> It should also include a pointer to the AMD erratum, if available, or
> at least some reference to how we know it doesn't obey the rules.
> 
> Ashok, thanks for chiming in.  Now that you have, I have a few more
> questions for you:
> 
>   - Is the above doc the one you mentioned as being now public?

Yes. 
>   
>   - Is this considered a hardware erratum?

I would think so. I have tried to pursue the publication in that direction
but it morphed into the optimization guide instead. Once it got into some
open doc i stopped pushing.. but will continue to get this into erratum. i do
agree that's the right place holder for this.

>   
>   - If so, is there a pointer to that as well?
>   
>   - If this is not considered an erratum, can you provide any guidance
>     about how an OS should determine when it should use RO?

The optimization guide states that it only applies to transactions targetting
system memory. For peer-2-peer RO is allowed and has perf upside.

As Casey pointed out in an earlier thread, we choose the heavy hammer approach
because there are some that can lead to data-corruption as opposed to perf
degradation. 

This looks ugly, but maybe we can have 2 flags. one that indicates its a strict
no-no, and one that says no to system memory only. That way driver can
determine when the device would turn the hint on in the TLP.

>     
> Relying on a list of device IDs in an optimization manual is OK for an
> erratum, but if it's *not* an erratum, it seems like a hole in the

Good point.. for this specific case its really an erratum, but for some
reason they made the decision to use this doc vs. the generic errata
data-sheet that would have been the preferred way to document.

> specs because as far as I know there's no generic way for the OS to
> discover whether to use RO.
> 

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-09 12:17           ` Ding Tianhong
  (?)
  (?)
@ 2017-08-09 16:36             ` Casey Leedom
  -1 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 16:36 UTC (permalink / raw)
  To: Ding Tianhong, Bjorn Helgaas
  Cc: ashok.raj, bhelgaas, Michael Werner, Ganesh GR, asit.k.mallick,
	patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach,
	amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, August 9, 2017 5:17 AM
|
| On 2017/8/9 11:02, Bjorn Helgaas wrote:
| >
| > On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
| > >
| >> | From: Bjorn Helgaas <helgaas@kernel.org>
| >> | Sent: Tuesday, August 8, 2017 4:22 PM
| >> | ...
| >> | It should also include a pointer to the AMD erratum, if available, or
| >> | at least some reference to how we know it doesn't obey the rules.
| >>
| >>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
| >> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
| >> messages almost a year ago saying that all of AMD's energies were being
| >> redirected towards upcoming x86 products (likely Ryzen as we now know).
| >> As far as I can tell AMD has walked away from their A1100 (AKA
| >> "Seattle") ARM SoC.
| >>
| >>   On the specific issue, I can certainly write up somthing even more
| >> extensive than I wrote up for the comment in drivers/pci/quirks.c.
| >> Please review the comment I wrote up and tell me if you'd like
| >> something even more detailed -- I'm usually acused of writing comments
| >> which are too long, so this would be a new one on me ... :-)
| >
| > If you have any bug reports with info about how you debugged it and
| > concluded that Seattle is broken, you could include a link (probably
| > in the changelog).  But if there isn't anything, there isn't anything.
| ...
| OK, I could reorganize it, but still need the Casey to give me the link
| for the Seattle, otherwise I could remove the AMD part and wait until
| someone show it. Thanks

There are no links and I was never given an internal bug number at AMD.  As
I said, they stopped responding to my notes about a years ago saying that
they were moving the focus of all their people and no longer had resources
to pursue the issue.  Hopefully for them, Ryzen doesn't have the same
Data Corruption problem ...

As for how we diagnosed it, with our Ingress Packet delivery, we have the
Ingress Packet Data delivered (DMA Write) into Free List Buffers, and then
then a small message (DMA Write) to a "Response Queue" indicating delivery
of the Ingress Packet Data into the Free List Buffers.  The Transaction
Layer Packets which convey the Ingress Packet Data all have the Relaxed
Ordering Attribute set, while the following TLP carring the Ingress Data
delivery notification into the Response Queue does not have the Relaxed
Ordering Attribute set.

The rules for processing TLPs with and without the Relaxed Ordering
Attribute set are covered in Section 2.4.1 of the PCIe 3.0 specification
(Revision 3.0 November 10, 2010).  Table 2-34 "Ordering Rules Summary"
covers the cases where one TLP may "pass" (be proccessed earlier) than a
preceding TLP.  In the case we're talking about, we have a sequence of one
or more Posted DMA Write TLPs with the Relaxed Ordering Attribute set and a
following Posted DMA Write TLP without the Relaxed Ordering Attribute set.
Thus we need to look at the Row A, Column 2 cell of Table 2-34 governing
when a Posted Request may "pass" a preceeding Posted Request.  In that cell
we have:

    a) No
    b) Y/N

with the explanatory text:

    A2a    A Posted Request must not pass another Posted Request
           unless A2b applies.

    A2b    A Posted Request with RO[23] Set is permitted to pass
           another Posted Request[24].  A Posted Request with IDO
           Set is permitted to pass another Posted Request if the
           two Requester IDs are different.

    [23] In this section, "RO" is an abbreviation for the Relaxed
         Ordering Attribute field.

    [24] Some usages are enabled by not implementing this passing
         (see the No RO-enabled PR-PR Passing bit in Section
         7.8.15).

In our case, we were getting notifications of Ingress Packet Delivery in our
Response Queues, but not all of the Ingress Packet Data Posted DMA Write
TLPs had been processed yet by the Root Complex.  As a result, we were
picking up old stale memory data before those lagging Ingress Packet Data
TLPs could be processed.  This is a clear violation of the PCIe 3.0 TLP
processing rules outlined above.

Does that help?

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 16:36             ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 16:36 UTC (permalink / raw)
  To: Ding Tianhong, Bjorn Helgaas
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, linux-pci, Ganesh GR, Bob.Shaw,
	patrick.j.cramer, bhelgaas, Michael Werner, linux-arm-kernel,
	amira,

| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, August 9, 2017 5:17 AM
|
| On 2017/8/9 11:02, Bjorn Helgaas wrote:
| >
| > On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
| > >
| >> | From: Bjorn Helgaas <helgaas@kernel.org>
| >> | Sent: Tuesday, August 8, 2017 4:22 PM
| >> | ...
| >> | It should also include a pointer to the AMD erratum, if available, or
| >> | at least some reference to how we know it doesn't obey the rules.
| >>
| >>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
| >> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
| >> messages almost a year ago saying that all of AMD's energies were being
| >> redirected towards upcoming x86 products (likely Ryzen as we now know).
| >> As far as I can tell AMD has walked away from their A1100 (AKA
| >> "Seattle") ARM SoC.
| >>
| >>   On the specific issue, I can certainly write up somthing even more
| >> extensive than I wrote up for the comment in drivers/pci/quirks.c.
| >> Please review the comment I wrote up and tell me if you'd like
| >> something even more detailed -- I'm usually acused of writing comments
| >> which are too long, so this would be a new one on me ... :-)
| >
| > If you have any bug reports with info about how you debugged it and
| > concluded that Seattle is broken, you could include a link (probably
| > in the changelog).  But if there isn't anything, there isn't anything.
| ...
| OK, I could reorganize it, but still need the Casey to give me the link
| for the Seattle, otherwise I could remove the AMD part and wait until
| someone show it. Thanks

There are no links and I was never given an internal bug number at AMD.  As
I said, they stopped responding to my notes about a years ago saying that
they were moving the focus of all their people and no longer had resources
to pursue the issue.  Hopefully for them, Ryzen doesn't have the same
Data Corruption problem ...

As for how we diagnosed it, with our Ingress Packet delivery, we have the
Ingress Packet Data delivered (DMA Write) into Free List Buffers, and then
then a small message (DMA Write) to a "Response Queue" indicating delivery
of the Ingress Packet Data into the Free List Buffers.  The Transaction
Layer Packets which convey the Ingress Packet Data all have the Relaxed
Ordering Attribute set, while the following TLP carring the Ingress Data
delivery notification into the Response Queue does not have the Relaxed
Ordering Attribute set.

The rules for processing TLPs with and without the Relaxed Ordering
Attribute set are covered in Section 2.4.1 of the PCIe 3.0 specification
(Revision 3.0 November 10, 2010).  Table 2-34 "Ordering Rules Summary"
covers the cases where one TLP may "pass" (be proccessed earlier) than a
preceding TLP.  In the case we're talking about, we have a sequence of one
or more Posted DMA Write TLPs with the Relaxed Ordering Attribute set and a
following Posted DMA Write TLP without the Relaxed Ordering Attribute set.
Thus we need to look at the Row A, Column 2 cell of Table 2-34 governing
when a Posted Request may "pass" a preceeding Posted Request.  In that cell
we have:

    a) No
    b) Y/N

with the explanatory text:

    A2a    A Posted Request must not pass another Posted Request
           unless A2b applies.

    A2b    A Posted Request with RO[23] Set is permitted to pass
           another Posted Request[24].  A Posted Request with IDO
           Set is permitted to pass another Posted Request if the
           two Requester IDs are different.

    [23] In this section, "RO" is an abbreviation for the Relaxed
         Ordering Attribute field.

    [24] Some usages are enabled by not implementing this passing
         (see the No RO-enabled PR-PR Passing bit in Section
         7.8.15).

In our case, we were getting notifications of Ingress Packet Delivery in our
Response Queues, but not all of the Ingress Packet Data Posted DMA Write
TLPs had been processed yet by the Root Complex.  As a result, we were
picking up old stale memory data before those lagging Ingress Packet Data
TLPs could be processed.  This is a clear violation of the PCIe 3.0 TLP
processing rules outlined above.

Does that help?

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 16:36             ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 16:36 UTC (permalink / raw)
  To: Ding Tianhong, Bjorn Helgaas
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, ashok.raj,
	jeffrey.t.kirsher, linux-pci, Ganesh GR, Bob.Shaw,
	patrick.j.cramer, bhelgaas, Michael Werner, linux-arm-kernel,
	amira, netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit,
	robin.murphy, davem, l.stach

| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, August 9, 2017 5:17 AM
|
| On 2017/8/9 11:02, Bjorn Helgaas wrote:
| >
| > On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
| > >
| >> | From: Bjorn Helgaas <helgaas@kernel.org>
| >> | Sent: Tuesday, August 8, 2017 4:22 PM
| >> | ...
| >> | It should also include a pointer to the AMD erratum, if available, or
| >> | at least some reference to how we know it doesn't obey the rules.
| >>
| >>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
| >> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
| >> messages almost a year ago saying that all of AMD's energies were being
| >> redirected towards upcoming x86 products (likely Ryzen as we now know).
| >> As far as I can tell AMD has walked away from their A1100 (AKA
| >> "Seattle") ARM SoC.
| >>
| >>   On the specific issue, I can certainly write up somthing even more
| >> extensive than I wrote up for the comment in drivers/pci/quirks.c.
| >> Please review the comment I wrote up and tell me if you'd like
| >> something even more detailed -- I'm usually acused of writing comments
| >> which are too long, so this would be a new one on me ... :-)
| >
| > If you have any bug reports with info about how you debugged it and
| > concluded that Seattle is broken, you could include a link (probably
| > in the changelog).  But if there isn't anything, there isn't anything.
| ...
| OK, I could reorganize it, but still need the Casey to give me the link
| for the Seattle, otherwise I could remove the AMD part and wait until
| someone show it. Thanks

There are no links and I was never given an internal bug number at AMD.  As
I said, they stopped responding to my notes about a years ago saying that
they were moving the focus of all their people and no longer had resources
to pursue the issue.  Hopefully for them, Ryzen doesn't have the same
Data Corruption problem ...

As for how we diagnosed it, with our Ingress Packet delivery, we have the
Ingress Packet Data delivered (DMA Write) into Free List Buffers, and then
then a small message (DMA Write) to a "Response Queue" indicating delivery
of the Ingress Packet Data into the Free List Buffers.  The Transaction
Layer Packets which convey the Ingress Packet Data all have the Relaxed
Ordering Attribute set, while the following TLP carring the Ingress Data
delivery notification into the Response Queue does not have the Relaxed
Ordering Attribute set.

The rules for processing TLPs with and without the Relaxed Ordering
Attribute set are covered in Section 2.4.1 of the PCIe 3.0 specification
(Revision 3.0 November 10, 2010).  Table 2-34 "Ordering Rules Summary"
covers the cases where one TLP may "pass" (be proccessed earlier) than a
preceding TLP.  In the case we're talking about, we have a sequence of one
or more Posted DMA Write TLPs with the Relaxed Ordering Attribute set and a
following Posted DMA Write TLP without the Relaxed Ordering Attribute set.
Thus we need to look at the Row A, Column 2 cell of Table 2-34 governing
when a Posted Request may "pass" a preceeding Posted Request.  In that cell
we have:

    a) No
    b) Y/N

with the explanatory text:

    A2a    A Posted Request must not pass another Posted Request
           unless A2b applies.

    A2b    A Posted Request with RO[23] Set is permitted to pass
           another Posted Request[24].  A Posted Request with IDO
           Set is permitted to pass another Posted Request if the
           two Requester IDs are different.

    [23] In this section, "RO" is an abbreviation for the Relaxed
         Ordering Attribute field.

    [24] Some usages are enabled by not implementing this passing
         (see the No RO-enabled PR-PR Passing bit in Section
         7.8.15).

In our case, we were getting notifications of Ingress Packet Delivery in our
Response Queues, but not all of the Ingress Packet Data Posted DMA Write
TLPs had been processed yet by the Root Complex.  As a result, we were
picking up old stale memory data before those lagging Ingress Packet Data
TLPs could be processed.  This is a clear violation of the PCIe 3.0 TLP
processing rules outlined above.

Does that help?

Casey

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 16:36             ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 16:36 UTC (permalink / raw)
  To: linux-arm-kernel

| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, August 9, 2017 5:17 AM
|
| On 2017/8/9 11:02, Bjorn Helgaas wrote:
| >
| > On Wed, Aug 09, 2017 at 01:40:01AM +0000, Casey Leedom wrote:
| > >
| >> | From: Bjorn Helgaas <helgaas@kernel.org>
| >> | Sent: Tuesday, August 8, 2017 4:22 PM
| >> | ...
| >> | It should also include a pointer to the AMD erratum, if available, or
| >> | at least some reference to how we know it doesn't obey the rules.
| >>
| >>   Getting an ACK from AMD seems like a forlorn cause at this point.  My
| >> contact was Bob Shaw <Bob.Shaw@amd.com> and he stopped responding to me
| >> messages almost a year ago saying that all of AMD's energies were being
| >> redirected towards upcoming x86 products (likely Ryzen as we now know).
| >> As far as I can tell AMD has walked away from their A1100 (AKA
| >> "Seattle") ARM SoC.
| >>
| >>   On the specific issue, I can certainly write up somthing even more
| >> extensive than I wrote up for the comment in drivers/pci/quirks.c.
| >> Please review the comment I wrote up and tell me if you'd like
| >> something even more detailed -- I'm usually acused of writing comments
| >> which are too long, so this would be a new one on me ... :-)
| >
| > If you have any bug reports with info about how you debugged it and
| > concluded that Seattle is broken, you could include a link (probably
| > in the changelog).  But if there isn't anything, there isn't anything.
| ...
| OK, I could reorganize it, but still need the Casey to give me the link
| for the Seattle, otherwise I could remove the AMD part and wait until
| someone show it. Thanks

There are no links and I was never given an internal bug number at AMD.  As
I said, they stopped responding to my notes about a years ago saying that
they were moving the focus of all their people and no longer had resources
to pursue the issue.  Hopefully for them, Ryzen doesn't have the same
Data Corruption problem ...

As for how we diagnosed it, with our Ingress Packet delivery, we have the
Ingress Packet Data delivered (DMA Write) into Free List Buffers, and then
then a small message (DMA Write) to a "Response Queue" indicating delivery
of the Ingress Packet Data into the Free List Buffers.  The Transaction
Layer Packets which convey the Ingress Packet Data all have the Relaxed
Ordering Attribute set, while the following TLP carring the Ingress Data
delivery notification into the Response Queue does not have the Relaxed
Ordering Attribute set.

The rules for processing TLPs with and without the Relaxed Ordering
Attribute set are covered in Section 2.4.1 of the PCIe 3.0 specification
(Revision 3.0 November 10, 2010).  Table 2-34 "Ordering Rules Summary"
covers the cases where one TLP may "pass" (be proccessed earlier) than a
preceding TLP.  In the case we're talking about, we have a sequence of one
or more Posted DMA Write TLPs with the Relaxed Ordering Attribute set and a
following Posted DMA Write TLP without the Relaxed Ordering Attribute set.
Thus we need to look at the Row A, Column 2 cell of Table 2-34 governing
when a Posted Request may "pass" a preceeding Posted Request.  In that cell
we have:

    a) No
    b) Y/N

with the explanatory text:

    A2a    A Posted Request must not pass another Posted Request
           unless A2b applies.

    A2b    A Posted Request with RO[23] Set is permitted to pass
           another Posted Request[24].  A Posted Request with IDO
           Set is permitted to pass another Posted Request if the
           two Requester IDs are different.

    [23] In this section, "RO" is an abbreviation for the Relaxed
         Ordering Attribute field.

    [24] Some usages are enabled by not implementing this passing
         (see the No RO-enabled PR-PR Passing bit in Section
         7.8.15).

In our case, we were getting notifications of Ingress Packet Delivery in our
Response Queues, but not all of the Ingress Packet Data Posted DMA Write
TLPs had been processed yet by the Root Complex.  As a result, we were
picking up old stale memory data before those lagging Ingress Packet Data
TLPs could be processed.  This is a clear violation of the PCIe 3.0 TLP
processing rules outlined above.

Does that help?

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-09 15:58       ` Raj, Ashok
  (?)
  (?)
@ 2017-08-09 16:46         ` Casey Leedom
  -1 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 16:46 UTC (permalink / raw)
  To: Raj, Ashok, Bjorn Helgaas
  Cc: Ding Tianhong, bhelgaas, Michael Werner, Ganesh GR,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland,
	robin.murphy, davem, alexander.duyck, linux-arm-kernel, netdev,
	linux-pci, linux-kernel, linuxarm

| From: Raj, Ashok <ashok.raj@intel.com>
| Sent: Wednesday, August 9, 2017 8:58 AM
| ...
| As Casey pointed out in an earlier thread, we choose the heavy hammer
| approach because there are some that can lead to data-corruption as opposed
| to perf degradation.

Careful.  As far as I'm aware, there is no Data Corruption problem
whatsoever with Intel Root Ports and processing of Transaction Layer Packets
with and without the Relaxed Ordering Attribute set.

The only issue which we've discovered with relatively recent Intel Root Port
implementations and the use of the Relaxed Ordering Attribute is a
performance issue.  To the best of our ability to analyze the PCIe traces,
it appeared that the Intel Root Complex delayed returning Link Flow Control
Credits resulting in lowered performance (total bandwidth).  When we used
Relaxed Ordering for Ingress Packet Data delivery on a 100Gb/s Ethernet
link with 1500-byte MTU, we were pegged at ~75Gb/s.  Once we disabled
Relaxed Ordering, we were able to deliver Ingress Packet Data to Host Memory
at the full link rate.

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 16:46         ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 16:46 UTC (permalink / raw)
  To: Raj, Ashok, Bjorn Helgaas
  Cc: Ding Tianhong, bhelgaas, Michael Werner, Ganesh GR,
	asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit,
	Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight,
	jeffrey.t.kirsher, catalin.marinas, will.deacon,
	mark.rutland@arm.com

| From: Raj, Ashok <ashok.raj@intel.com>
| Sent: Wednesday, August 9, 2017 8:58 AM
| ...
| As Casey pointed out in an earlier thread, we choose the heavy hammer
| approach because there are some that can lead to data-corruption as opposed
| to perf degradation.

Careful.  As far as I'm aware, there is no Data Corruption problem
whatsoever with Intel Root Ports and processing of Transaction Layer Packets
with and without the Relaxed Ordering Attribute set.

The only issue which we've discovered with relatively recent Intel Root Port
implementations and the use of the Relaxed Ordering Attribute is a
performance issue.  To the best of our ability to analyze the PCIe traces,
it appeared that the Intel Root Complex delayed returning Link Flow Control
Credits resulting in lowered performance (total bandwidth).  When we used
Relaxed Ordering for Ingress Packet Data delivery on a 100Gb/s Ethernet
link with 1500-byte MTU, we were pegged at ~75Gb/s.  Once we disabled
Relaxed Ordering, we were able to deliver Ingress Packet Data to Host Memory
at the full link rate.

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 16:46         ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 16:46 UTC (permalink / raw)
  To: Raj, Ashok, Bjorn Helgaas
  Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas,
	will.deacon, linuxarm, alexander.duyck, amira, jeffrey.t.kirsher,
	linux-pci, Ganesh GR, Bob.Shaw, patrick.j.cramer, Ding Tianhong,
	bhelgaas, Michael Werner, linux-arm-kernel, netdev, linux-kernel,
	David.Laight, Suravee.Suthikulpanit, robin.murphy, davem,
	l.stach

| From: Raj, Ashok <ashok.raj@intel.com>
| Sent: Wednesday, August 9, 2017 8:58 AM
| ...
| As Casey pointed out in an earlier thread, we choose the heavy hammer
| approach because there are some that can lead to data-corruption as opposed
| to perf degradation.

Careful.  As far as I'm aware, there is no Data Corruption problem
whatsoever with Intel Root Ports and processing of Transaction Layer Packets
with and without the Relaxed Ordering Attribute set.

The only issue which we've discovered with relatively recent Intel Root Port
implementations and the use of the Relaxed Ordering Attribute is a
performance issue.  To the best of our ability to analyze the PCIe traces,
it appeared that the Intel Root Complex delayed returning Link Flow Control
Credits resulting in lowered performance (total bandwidth).  When we used
Relaxed Ordering for Ingress Packet Data delivery on a 100Gb/s Ethernet
link with 1500-byte MTU, we were pegged at ~75Gb/s.  Once we disabled
Relaxed Ordering, we were able to deliver Ingress Packet Data to Host Memory
at the full link rate.

Casey

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 16:46         ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 16:46 UTC (permalink / raw)
  To: linux-arm-kernel

| From: Raj, Ashok <ashok.raj@intel.com>
| Sent: Wednesday, August 9, 2017 8:58 AM
| ...
| As Casey pointed out in an earlier thread, we choose the heavy hammer
| approach because there are some that can lead to data-corruption as opposed
| to perf degradation.

Careful.  As far as I'm aware, there is no Data Corruption problem
whatsoever with Intel Root Ports and processing of Transaction Layer Packets
with and without the Relaxed Ordering Attribute set.

The only issue which we've discovered with relatively recent Intel Root Port
implementations and the use of the Relaxed Ordering Attribute is a
performance issue.  To the best of our ability to analyze the PCIe traces,
it appeared that the Intel Root Complex delayed returning Link Flow Control
Credits resulting in lowered performance (total bandwidth).  When we used
Relaxed Ordering for Ingress Packet Data delivery on a 100Gb/s Ethernet
link with 1500-byte MTU, we were pegged at ~75Gb/s.  Once we disabled
Relaxed Ordering, we were able to deliver Ingress Packet Data to Host Memory
at the full link rate.

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-09 16:46         ` Casey Leedom
  (?)
  (?)
@ 2017-08-09 18:00           ` Raj, Ashok
  -1 siblings, 0 replies; 70+ messages in thread
From: Raj, Ashok @ 2017-08-09 18:00 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Bjorn Helgaas, Ding Tianhong, bhelgaas, Michael Werner,
	Ganesh GR, asit.k.mallick, patrick.j.cramer,
	Suravee.Suthikulpanit, Bob.Shaw, l.stach, amira,
	gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm, ashok.raj

On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote:
> | From: Raj, Ashok <ashok.raj@intel.com>
> | Sent: Wednesday, August 9, 2017 8:58 AM
> | ...
> | As Casey pointed out in an earlier thread, we choose the heavy hammer
> | approach because there are some that can lead to data-corruption as opposed
> | to perf degradation.
> 
> Careful.  As far as I'm aware, there is no Data Corruption problem
> whatsoever with Intel Root Ports and processing of Transaction Layer Packets
> with and without the Relaxed Ordering Attribute set.

That's right.. no data-corruption on Intel parts :-).. It was with
other vendor. Only performance issue with intel root-ports in the parts
identified by the optimization guide. 

Cheers,
AShok

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 18:00           ` Raj, Ashok
  0 siblings, 0 replies; 70+ messages in thread
From: Raj, Ashok @ 2017-08-09 18:00 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Bjorn Helgaas, Ding Tianhong, bhelgaas, Michael Werner,
	Ganesh GR, asit.k.mallick, patrick.j.cramer,
	Suravee.Suthikulpanit, Bob.Shaw, l.stach, amira,
	gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon

On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote:
> | From: Raj, Ashok <ashok.raj@intel.com>
> | Sent: Wednesday, August 9, 2017 8:58 AM
> | ...
> | As Casey pointed out in an earlier thread, we choose the heavy hammer
> | approach because there are some that can lead to data-corruption as opposed
> | to perf degradation.
> 
> Careful.  As far as I'm aware, there is no Data Corruption problem
> whatsoever with Intel Root Ports and processing of Transaction Layer Packets
> with and without the Relaxed Ordering Attribute set.

That's right.. no data-corruption on Intel parts :-).. It was with
other vendor. Only performance issue with intel root-ports in the parts
identified by the optimization guide. 

Cheers,
AShok

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 18:00           ` Raj, Ashok
  0 siblings, 0 replies; 70+ messages in thread
From: Raj, Ashok @ 2017-08-09 18:00 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Bjorn Helgaas, Ding Tianhong, bhelgaas, Michael Werner,
	Ganesh GR, asit.k.mallick, patrick.j.cramer,
	Suravee.Suthikulpanit, Bob.Shaw, l.stach, amira,
	gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm, ashok.raj

On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote:
> | From: Raj, Ashok <ashok.raj@intel.com>
> | Sent: Wednesday, August 9, 2017 8:58 AM
> | ...
> | As Casey pointed out in an earlier thread, we choose the heavy hammer
> | approach because there are some that can lead to data-corruption as opposed
> | to perf degradation.
> 
> Careful.  As far as I'm aware, there is no Data Corruption problem
> whatsoever with Intel Root Ports and processing of Transaction Layer Packets
> with and without the Relaxed Ordering Attribute set.

That's right.. no data-corruption on Intel parts :-).. It was with
other vendor. Only performance issue with intel root-ports in the parts
identified by the optimization guide. 

Cheers,
AShok

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 18:00           ` Raj, Ashok
  0 siblings, 0 replies; 70+ messages in thread
From: Raj, Ashok @ 2017-08-09 18:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote:
> | From: Raj, Ashok <ashok.raj@intel.com>
> | Sent: Wednesday, August 9, 2017 8:58 AM
> | ...
> | As Casey pointed out in an earlier thread, we choose the heavy hammer
> | approach because there are some that can lead to data-corruption as opposed
> | to perf degradation.
> 
> Careful.  As far as I'm aware, there is no Data Corruption problem
> whatsoever with Intel Root Ports and processing of Transaction Layer Packets
> with and without the Relaxed Ordering Attribute set.

That's right.. no data-corruption on Intel parts :-).. It was with
other vendor. Only performance issue with intel root-ports in the parts
identified by the optimization guide. 

Cheers,
AShok

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-08-09 18:00           ` Raj, Ashok
  (?)
  (?)
@ 2017-08-09 20:11             ` Casey Leedom
  -1 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 20:11 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Bjorn Helgaas, Ding Tianhong, bhelgaas, Michael Werner,
	Ganesh GR, asit.k.mallick, patrick.j.cramer,
	Suravee.Suthikulpanit, Bob.Shaw, l.stach, amira,
	gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

| From: Raj, Ashok <ashok.raj@intel.com>
| Sent: Wednesday, August 9, 2017 11:00 AM
|
| On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote:
| > | From: Raj, Ashok <ashok.raj@intel.com>
| > | Sent: Wednesday, August 9, 2017 8:58 AM
| > | ...
| > | As Casey pointed out in an earlier thread, we choose the heavy hammer
| > | approach because there are some that can lead to data-corruption as
| > | opposed to perf degradation.
| >
| > Careful.  As far as I'm aware, there is no Data Corruption problem
| > whatsoever with Intel Root Ports and processing of Transaction Layer
| > Packets with and without the Relaxed Ordering Attribute set.
|
| That's right.. no data-corruption on Intel parts :-).. It was with other
| vendor. Only performance issue with intel root-ports in the parts identified
| by the optimization guide.

Yes, I didn't want you to get into any trouble over that possible reading of
what you wrote.

Any progress on the "Chicken Bit" investigation?  Being able to disable the
non-optimal Relaxed Ordering "optimization" would be the best PCI Quirk of
all ...

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 20:11             ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 20:11 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Bjorn Helgaas, Ding Tianhong, bhelgaas, Michael Werner,
	Ganesh GR, asit.k.mallick, patrick.j.cramer,
	Suravee.Suthikulpanit, Bob.Shaw, l.stach, amira,
	gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon

| From: Raj, Ashok <ashok.raj@intel.com>
| Sent: Wednesday, August 9, 2017 11:00 AM
|
| On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote:
| > | From: Raj, Ashok <ashok.raj@intel.com>
| > | Sent: Wednesday, August 9, 2017 8:58 AM
| > | ...
| > | As Casey pointed out in an earlier thread, we choose the heavy hammer
| > | approach because there are some that can lead to data-corruption as
| > | opposed to perf degradation.
| >
| > Careful.  As far as I'm aware, there is no Data Corruption problem
| > whatsoever with Intel Root Ports and processing of Transaction Layer
| > Packets with and without the Relaxed Ordering Attribute set.
|
| That's right.. no data-corruption on Intel parts :-).. It was with other
| vendor. Only performance issue with intel root-ports in the parts identified
| by the optimization guide.

Yes, I didn't want you to get into any trouble over that possible reading of
what you wrote.

Any progress on the "Chicken Bit" investigation?  Being able to disable the
non-optimal Relaxed Ordering "optimization" would be the best PCI Quirk of
all ...

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 20:11             ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 20:11 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Bjorn Helgaas, Ding Tianhong, bhelgaas, Michael Werner,
	Ganesh GR, asit.k.mallick, patrick.j.cramer,
	Suravee.Suthikulpanit, Bob.Shaw, l.stach, amira,
	gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
	catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
	alexander.duyck, linux-arm-kernel, netdev, linux-pci,
	linux-kernel, linuxarm

| From: Raj, Ashok <ashok.raj@intel.com>
| Sent: Wednesday, August 9, 2017 11:00 AM
|
| On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote:
| > | From: Raj, Ashok <ashok.raj@intel.com>
| > | Sent: Wednesday, August 9, 2017 8:58 AM
| > | ...
| > | As Casey pointed out in an earlier thread, we choose the heavy hammer
| > | approach because there are some that can lead to data-corruption as
| > | opposed to perf degradation.
| >
| > Careful.  As far as I'm aware, there is no Data Corruption problem
| > whatsoever with Intel Root Ports and processing of Transaction Layer
| > Packets with and without the Relaxed Ordering Attribute set.
|
| That's right.. no data-corruption on Intel parts :-).. It was with other
| vendor. Only performance issue with intel root-ports in the parts identif=
ied
| by the optimization guide.

Yes, I didn't want you to get into any trouble over that possible reading o=
f
what you wrote.

Any progress on the "Chicken Bit" investigation?  Being able to disable the
non-optimal Relaxed Ordering "optimization" would be the best PCI Quirk of
all ...

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-08-09 20:11             ` Casey Leedom
  0 siblings, 0 replies; 70+ messages in thread
From: Casey Leedom @ 2017-08-09 20:11 UTC (permalink / raw)
  To: linux-arm-kernel

| From: Raj, Ashok <ashok.raj@intel.com>
| Sent: Wednesday, August 9, 2017 11:00 AM
|
| On Wed, Aug 09, 2017 at 04:46:07PM +0000, Casey Leedom wrote:
| > | From: Raj, Ashok <ashok.raj@intel.com>
| > | Sent: Wednesday, August 9, 2017 8:58 AM
| > | ...
| > | As Casey pointed out in an earlier thread, we choose the heavy hammer
| > | approach because there are some that can lead to data-corruption as
| > | opposed to perf degradation.
| >
| > Careful.  As far as I'm aware, there is no Data Corruption problem
| > whatsoever with Intel Root Ports and processing of Transaction Layer
| > Packets with and without the Relaxed Ordering Attribute set.
|
| That's right.. no data-corruption on Intel parts :-).. It was with other
| vendor. Only performance issue with intel root-ports in the parts identified
| by the optimization guide.

Yes, I didn't want you to get into any trouble over that possible reading of
what you wrote.

Any progress on the "Chicken Bit" investigation?  Being able to disable the
non-optimal Relaxed Ordering "optimization" would be the best PCI Quirk of
all ...

Casey

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2017-08-09 20:11 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-05  7:15 [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Ding Tianhong
2017-08-05  7:15 ` Ding Tianhong
2017-08-05  7:15 ` [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING Ding Tianhong
2017-08-05  7:15   ` Ding Tianhong
2017-08-05  7:15   ` Ding Tianhong
2017-08-08 23:22   ` Bjorn Helgaas
2017-08-08 23:22     ` Bjorn Helgaas
2017-08-08 23:22     ` Bjorn Helgaas
2017-08-09  1:40     ` Casey Leedom
2017-08-09  1:40       ` Casey Leedom
2017-08-09  1:40       ` Casey Leedom
2017-08-09  1:40       ` Casey Leedom
2017-08-09  3:02       ` Bjorn Helgaas
2017-08-09  3:02         ` Bjorn Helgaas
2017-08-09  3:02         ` Bjorn Helgaas
2017-08-09  3:02         ` Bjorn Helgaas
2017-08-09 12:17         ` Ding Tianhong
2017-08-09 12:17           ` Ding Tianhong
2017-08-09 12:17           ` Ding Tianhong
2017-08-09 12:17           ` Ding Tianhong
2017-08-09 16:36           ` Casey Leedom
2017-08-09 16:36             ` Casey Leedom
2017-08-09 16:36             ` Casey Leedom
2017-08-09 16:36             ` Casey Leedom
2017-08-09 15:58     ` Raj, Ashok
2017-08-09 15:58       ` Raj, Ashok
2017-08-09 16:46       ` Casey Leedom
2017-08-09 16:46         ` Casey Leedom
2017-08-09 16:46         ` Casey Leedom
2017-08-09 16:46         ` Casey Leedom
2017-08-09 18:00         ` Raj, Ashok
2017-08-09 18:00           ` Raj, Ashok
2017-08-09 18:00           ` Raj, Ashok
2017-08-09 18:00           ` Raj, Ashok
2017-08-09 20:11           ` Casey Leedom
2017-08-09 20:11             ` Casey Leedom
2017-08-09 20:11             ` Casey Leedom
2017-08-09 20:11             ` Casey Leedom
2017-08-05  7:15 ` [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported Ding Tianhong
2017-08-05  7:15   ` Ding Tianhong
2017-08-05  7:15   ` Ding Tianhong
2017-08-09  2:22   ` Bjorn Helgaas
2017-08-09  2:22     ` Bjorn Helgaas
2017-08-09  2:22     ` Bjorn Helgaas
2017-08-09  3:25     ` Bjorn Helgaas
2017-08-09  3:25       ` Bjorn Helgaas
2017-08-09  3:25       ` Bjorn Helgaas
2017-08-09 13:42       ` Ding Tianhong
2017-08-09 13:42         ` Ding Tianhong
2017-08-09 12:33     ` Casey Leedom
2017-08-09 12:33       ` Casey Leedom
2017-08-09 12:33       ` Casey Leedom
2017-08-09 12:33       ` Casey Leedom
2017-08-09 13:23     ` Ding Tianhong
2017-08-09 13:23       ` Ding Tianhong
2017-08-05  7:15 ` [PATCH v9 3/4] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Ding Tianhong
2017-08-05  7:15   ` Ding Tianhong
2017-08-05  7:15 ` [PATCH v9 4/4] net/cxgb4vf: " Ding Tianhong
2017-08-05  7:15   ` Ding Tianhong
2017-08-05  7:15   ` Ding Tianhong
2017-08-07  3:47 ` [PATCH v9 0/4] Add " David Miller
2017-08-07  3:47   ` David Miller
2017-08-07  4:13   ` Ding Tianhong
2017-08-07  4:13     ` Ding Tianhong
2017-08-07  4:13     ` Ding Tianhong
2017-08-07 21:14     ` David Miller
2017-08-07 21:14       ` David Miller
2017-08-08  1:56       ` Bjorn Helgaas
2017-08-08  1:56         ` Bjorn Helgaas
2017-08-08  1:56         ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.