All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-05-01 23:13 ` Casey Leedom
  0 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-01 23:13 UTC (permalink / raw)
  To: Bjorn Helgaas, leedom
  Cc: Casey Leedom, Michael Werner, Ganesh Goudar, Arjun V,
	David Miller, Asit K Mallick, Patrick J Cramer, Ashok Raj,
	Suravee Suthikulpanit, Bob Shaw, h, Alexander Duyck,
	Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight

Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

Casey Leedom (2):
  PCI: Add new PCIe Fabric End Node flag,
    PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +++++++++++
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |  5 ++--
 drivers/pci/quirks.c                            | 38 +++++++++++++++++++++++++
 include/linux/pci.h                             |  2 ++
 5 files changed, 61 insertions(+), 2 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/2] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-05-01 23:13 ` Casey Leedom
  0 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-01 23:13 UTC (permalink / raw)
  To: linux-arm-kernel

Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

Casey Leedom (2):
  PCI: Add new PCIe Fabric End Node flag,
    PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +++++++++++
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |  5 ++--
 drivers/pci/quirks.c                            | 38 +++++++++++++++++++++++++
 include/linux/pci.h                             |  2 ++
 5 files changed, 61 insertions(+), 2 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-01 23:13 ` Casey Leedom
@ 2017-05-01 23:13   ` Casey Leedom
  -1 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-01 23:13 UTC (permalink / raw)
  To: Bjorn Helgaas, leedom
  Cc: Mark Rutland, Gabriele Paoloni, Asit K Mallick, Catalin Marinas,
	Will Deacon, LinuxArm, Alexander Duyck, Ashok Raj, Ganesh Goudar,
	David Miller, jeffrey.t.kirsher, Bob Shaw, Casey Leedom,
	Patrick J Cramer, Arjun V, Ding Tianhong, Michael Werner,
	linux-arm-kernel, Amir Ancel, netdev, David Laight,
	Suravee Suthikulpanit, Robin Murphy, davem

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
Ordering Attribute should not be used on Transaction Layer Packets destined
for the PCIe End Node so flagged.  Initially flagged this way are Intel
E5-26xx Root Complex Ports which suffer from a Flow Control Credit
Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
---
 drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h  |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index f754453..4ae78b3 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
 			      quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
+ * cause performance problems with Upstream Transaction Layer Packets with
+ * Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index eb3da1a..6764f66 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -178,6 +178,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/* Don't use Relaxed Ordering for TLPs directed at this device */
+	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-01 23:13   ` Casey Leedom
  0 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-01 23:13 UTC (permalink / raw)
  To: linux-arm-kernel

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
Ordering Attribute should not be used on Transaction Layer Packets destined
for the PCIe End Node so flagged.  Initially flagged this way are Intel
E5-26xx Root Complex Ports which suffer from a Flow Control Credit
Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
---
 drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h  |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index f754453..4ae78b3 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
 			      quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
+ * cause performance problems with Upstream Transaction Layer Packets with
+ * Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
+			      quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index eb3da1a..6764f66 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -178,6 +178,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/* Don't use Relaxed Ordering for TLPs directed@this device */
+	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/2] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  2017-05-01 23:13 ` Casey Leedom
@ 2017-05-01 23:13   ` Casey Leedom
  -1 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-01 23:13 UTC (permalink / raw)
  To: Bjorn Helgaas, leedom
  Cc: Casey Leedom, Michael Werner, Ganesh Goudar, Arjun V,
	David Miller, Asit K Mallick, Patrick J Cramer, Ashok Raj,
	Suravee Suthikulpanit, Bob Shaw, h, Alexander Duyck,
	Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight

cxgb4 Ethernet driver now queries Root Complex Port to determine if it can
send TLPs to it with the Relaxed Ordering Attribute set.
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |  5 +++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 163543b..46d61b1 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -512,6 +512,7 @@ enum {                                 /* adapter flags */
 	USING_SOFT_PARAMS  = (1 << 6),
 	MASTER_PF          = (1 << 7),
 	FW_OFLD_CONN       = (1 << 9),
+	ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index afb0967..510c020 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4636,6 +4636,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 #ifdef CONFIG_PCI_IOV
 	u32 v, port_vec;
 #endif
+	struct pci_dev *root;
 
 	printk_once(KERN_INFO "%s - version %s\n", DRV_DESC, DRV_VERSION);
 
@@ -4734,6 +4735,22 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	adapter->msg_enable = DFLT_MSG_ENABLE;
 	memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+	/* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+	 * Ingress Packet Data to Free List Buffers in order to allow for
+	 * chipset performance optimizations between the Root Complex and
+	 * Memory Controllers.  (Messages to the associated Ingress Queue
+	 * notifying new Packet Placement in the Free Lists Buffers will be
+	 * send without the Relaxed Ordering Attribute thus guaranteing that
+	 * all preceding PCIe Transaction Layer Packets will be processed
+	 * first.)  But some Root Complexes have various issues with Upstream
+	 * Transaction Layer Packets with the Relaxed Ordering Attribute set.
+	 * So we check our Root Complex to see if it's flaged with advice
+	 * against using Relaxed Ordering.
+	 */
+	root = pci_find_pcie_root_port(adapter->pdev);
+	if (root && (root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))
+		adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
 	spin_lock_init(&adapter->stats_lock);
 	spin_lock_init(&adapter->tid_release_lock);
 	spin_lock_init(&adapter->win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index f05f0d4..ac229a3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2571,6 +2571,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 	struct fw_iq_cmd c;
 	struct sge *s = &adap->sge;
 	struct port_info *pi = netdev_priv(dev);
+	int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
 	/* Size needs to be multiple of 16, including status entry. */
 	iq->size = roundup(iq->size, 16);
@@ -2624,8 +2625,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 
 		flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
 		c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-					     FW_IQ_CMD_FL0FETCHRO_F |
-					     FW_IQ_CMD_FL0DATARO_F |
+					     FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+					     FW_IQ_CMD_FL0DATARO_V(relaxed) |
 					     FW_IQ_CMD_FL0PADEN_F);
 		if (cong >= 0)
 			c.iqns_to_fl0congen |=
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/2] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-05-01 23:13   ` Casey Leedom
  0 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-01 23:13 UTC (permalink / raw)
  To: linux-arm-kernel

cxgb4 Ethernet driver now queries Root Complex Port to determine if it can
send TLPs to it with the Relaxed Ordering Attribute set.
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |  5 +++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 163543b..46d61b1 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -512,6 +512,7 @@ enum {                                 /* adapter flags */
 	USING_SOFT_PARAMS  = (1 << 6),
 	MASTER_PF          = (1 << 7),
 	FW_OFLD_CONN       = (1 << 9),
+	ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index afb0967..510c020 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4636,6 +4636,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 #ifdef CONFIG_PCI_IOV
 	u32 v, port_vec;
 #endif
+	struct pci_dev *root;
 
 	printk_once(KERN_INFO "%s - version %s\n", DRV_DESC, DRV_VERSION);
 
@@ -4734,6 +4735,22 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	adapter->msg_enable = DFLT_MSG_ENABLE;
 	memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+	/* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+	 * Ingress Packet Data to Free List Buffers in order to allow for
+	 * chipset performance optimizations between the Root Complex and
+	 * Memory Controllers.  (Messages to the associated Ingress Queue
+	 * notifying new Packet Placement in the Free Lists Buffers will be
+	 * send without the Relaxed Ordering Attribute thus guaranteing that
+	 * all preceding PCIe Transaction Layer Packets will be processed
+	 * first.)  But some Root Complexes have various issues with Upstream
+	 * Transaction Layer Packets with the Relaxed Ordering Attribute set.
+	 * So we check our Root Complex to see if it's flaged with advice
+	 * against using Relaxed Ordering.
+	 */
+	root = pci_find_pcie_root_port(adapter->pdev);
+	if (root && (root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))
+		adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
 	spin_lock_init(&adapter->stats_lock);
 	spin_lock_init(&adapter->tid_release_lock);
 	spin_lock_init(&adapter->win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index f05f0d4..ac229a3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2571,6 +2571,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 	struct fw_iq_cmd c;
 	struct sge *s = &adap->sge;
 	struct port_info *pi = netdev_priv(dev);
+	int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
 	/* Size needs to be multiple of 16, including status entry. */
 	iq->size = roundup(iq->size, 16);
@@ -2624,8 +2625,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
 
 		flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
 		c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-					     FW_IQ_CMD_FL0FETCHRO_F |
-					     FW_IQ_CMD_FL0DATARO_F |
+					     FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+					     FW_IQ_CMD_FL0DATARO_V(relaxed) |
 					     FW_IQ_CMD_FL0PADEN_F);
 		if (cong >= 0)
 			c.iqns_to_fl0congen |=
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  2017-05-01 23:13 ` Casey Leedom
@ 2017-05-02  0:56   ` Ding Tianhong
  -1 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-02  0:56 UTC (permalink / raw)
  To: Casey Leedom, Bjorn Helgaas, leedom
  Cc: Michael Werner, Ganesh Goudar, Arjun V, David Miller,
	Asit K Mallick, Patrick J Cramer, Ashok Raj,
	Suravee Suthikulpanit, Bob Shaw, h, Alexander Duyck,
	Mark Rutland, Amir Ancel, Gabriele Paoloni, Catalin Marinas,
	Will Deacon, LinuxArm, David Laight, jeffrey.t.kirsher, netdev

Hi Casey:

This solution looks good to me, I will test this for ixgbe.:)

Thanks
Ding

On 2017/5/2 7:13, Casey Leedom wrote:
> Some devices have problems with Transaction Layer Packets with the Relaxed
> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> devices with Relaxed Ordering issues, and a use of this new flag by the
> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> Ports.
> 
> It's been years since I've submitted kernel.org patches, I appolgise for the
> almost certain submission errors.
> 
> Casey Leedom (2):
>   PCI: Add new PCIe Fabric End Node flag,
>     PCI_DEV_FLAGS_NO_RELAXED_ORDERING
>   net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
> 
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  1 +
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +++++++++++
>  drivers/net/ethernet/chelsio/cxgb4/sge.c        |  5 ++--
>  drivers/pci/quirks.c                            | 38 +++++++++++++++++++++++++
>  include/linux/pci.h                             |  2 ++
>  5 files changed, 61 insertions(+), 2 deletions(-)
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/2] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
@ 2017-05-02  0:56   ` Ding Tianhong
  0 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-02  0:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Casey:

This solution looks good to me, I will test this for ixgbe.:)

Thanks
Ding

On 2017/5/2 7:13, Casey Leedom wrote:
> Some devices have problems with Transaction Layer Packets with the Relaxed
> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> devices with Relaxed Ordering issues, and a use of this new flag by the
> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> Ports.
> 
> It's been years since I've submitted kernel.org patches, I appolgise for the
> almost certain submission errors.
> 
> Casey Leedom (2):
>   PCI: Add new PCIe Fabric End Node flag,
>     PCI_DEV_FLAGS_NO_RELAXED_ORDERING
>   net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
> 
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |  1 +
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +++++++++++
>  drivers/net/ethernet/chelsio/cxgb4/sge.c        |  5 ++--
>  drivers/pci/quirks.c                            | 38 +++++++++++++++++++++++++
>  include/linux/pci.h                             |  2 ++
>  5 files changed, 61 insertions(+), 2 deletions(-)
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-01 23:13   ` Casey Leedom
@ 2017-05-02  6:49     ` Ding Tianhong
  -1 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-02  6:49 UTC (permalink / raw)
  To: Casey Leedom, Bjorn Helgaas, leedom
  Cc: Michael Werner, Ganesh Goudar, Arjun V, David Miller,
	Asit K Mallick, Patrick J Cramer, Ashok Raj,
	Suravee Suthikulpanit, Bob Shaw, h, Alexander Duyck,
	Mark Rutland, Amir Ancel, Gabriele Paoloni, Catalin Marinas,
	Will Deacon, LinuxArm, David Laight, jeffrey.t.kirsher, netdev

hi, Casey:

On 2017/5/2 7:13, Casey Leedom wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> ---
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  			      quirk_tw686x_class);
>  
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +
> +/*
> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> + * Ordering for Upstream TLPs.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +
> +/*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>   * corresponding Request, except as explicitly allowed when IDO is used."
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..6764f66 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>  	/* Get VPD from function 0 VPD */
>  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +	/* Don't use Relaxed Ordering for TLPs directed at this device */
> +	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
>  };

What about add a new general func to check the RO for several drivers to use them ?

just like:

#define pci_dev_support_relaxed_ordering(struct pci_dev *root) \
	(!(root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))

Thanks
Ding

>  
>  enum pci_irq_reroute_variant {
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-02  6:49     ` Ding Tianhong
  0 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-02  6:49 UTC (permalink / raw)
  To: linux-arm-kernel

hi, Casey:

On 2017/5/2 7:13, Casey Leedom wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> ---
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  			      quirk_tw686x_class);
>  
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +
> +/*
> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> + * Ordering for Upstream TLPs.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +
> +/*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>   * corresponding Request, except as explicitly allowed when IDO is used."
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..6764f66 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>  	/* Get VPD from function 0 VPD */
>  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +	/* Don't use Relaxed Ordering for TLPs directed at this device */
> +	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
>  };

What about add a new general func to check the RO for several drivers to use them ?

just like:

#define pci_dev_support_relaxed_ordering(struct pci_dev *root) \
	(!(root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))

Thanks
Ding

>  
>  enum pci_irq_reroute_variant {
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-01 23:13   ` Casey Leedom
@ 2017-05-02 16:39     ` Alexander Duyck
  -1 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-02 16:39 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Bjorn Helgaas, leedom, Michael Werner, Ganesh Goudar, Arjun V,
	David Miller, Asit K Mallick, Patrick J Cramer, Ashok Raj,
	Suravee Suthikulpanit, Bob Shaw, h, Ding Tianhong, Mark Rutland,
	Amir Ancel, Gabriele Paoloni, Catalin Marinas, Will Deacon,
	LinuxArm, David Laight, Jeff Kirsher

On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.

So this is a good first step though might I suggest one other change.

We may want to add logic to the core PCIe code that clears the "Enable
Relaxed Ordering" bit in the device control register for all devices
hanging off of this root complex. Assuming the devices conform to the
PCIe spec doing that should disable relaxed ordering in a device
agnostic way that then enables us at a driver level to just enable the
feature always without having to perform any checks for your flag. We
could probably do that as a part of probing the PCIe interfaces
hanging off of these devices.

> ---
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>                               quirk_tw686x_class);
>
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +       dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +
> +/*
> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> + * Ordering for Upstream TLPs.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +
> +/*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>   * corresponding Request, except as explicitly allowed when IDO is used."
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..6764f66 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>         PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>         /* Get VPD from function 0 VPD */
>         PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +       /* Don't use Relaxed Ordering for TLPs directed at this device */
> +       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
>  };
>
>  enum pci_irq_reroute_variant {
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-02 16:39     ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-02 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.

So this is a good first step though might I suggest one other change.

We may want to add logic to the core PCIe code that clears the "Enable
Relaxed Ordering" bit in the device control register for all devices
hanging off of this root complex. Assuming the devices conform to the
PCIe spec doing that should disable relaxed ordering in a device
agnostic way that then enables us at a driver level to just enable the
feature always without having to perform any checks for your flag. We
could probably do that as a part of probing the PCIe interfaces
hanging off of these devices.

> ---
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>                               quirk_tw686x_class);
>
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +       dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +
> +/*
> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> + * Ordering for Upstream TLPs.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> +                             quirk_relaxedordering_disable);
> +
> +/*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>   * corresponding Request, except as explicitly allowed when IDO is used."
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..6764f66 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>         PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>         /* Get VPD from function 0 VPD */
>         PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +       /* Don't use Relaxed Ordering for TLPs directed at this device */
> +       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
>  };
>
>  enum pci_irq_reroute_variant {
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-01 23:13   ` Casey Leedom
@ 2017-05-02 16:44     ` Raj, Ashok
  -1 siblings, 0 replies; 50+ messages in thread
From: Raj, Ashok @ 2017-05-02 16:44 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Bjorn Helgaas, leedom, Michael Werner, Ganesh Goudar, Arjun V,
	David Miller, Asit K Mallick, Patrick J Cramer,
	Suravee Suthikulpanit, Bob Shaw, h, Alexander Duyck,
	Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight, je

Hi Casey


On Mon, May 01, 2017 at 04:13:50PM -0700, Casey Leedom wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> ---
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  			      quirk_tw686x_class);
>  
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +

You might want to add the RP ID's for both HSX/BDX. Tne entire range 
is 2F01H-2F0EH & 6F01H-6F0EH.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-02 16:44     ` Raj, Ashok
  0 siblings, 0 replies; 50+ messages in thread
From: Raj, Ashok @ 2017-05-02 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Casey


On Mon, May 01, 2017 at 04:13:50PM -0700, Casey Leedom wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> ---
>  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  			      quirk_tw686x_class);
>  
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +	dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +			      quirk_relaxedordering_disable);
> +

You might want to add the RP ID's for both HSX/BDX. Tne entire range 
is 2F01H-2F0EH & 6F01H-6F0EH.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-02 16:39     ` Alexander Duyck
@ 2017-05-02 16:53       ` Raj, Ashok
  -1 siblings, 0 replies; 50+ messages in thread
From: Raj, Ashok @ 2017-05-02 16:53 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Casey Leedom, Bjorn Helgaas, leedom, Michael Werner,
	Ganesh Goudar, Arjun V, David Miller, Asit K Mallick,
	Patrick J Cramer, Suravee Suthikulpanit, Bob Shaw, h,
	Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight, Jeff

On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> > Ordering Attribute should not be used on Transaction Layer Packets destined
> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> 
> So this is a good first step though might I suggest one other change.
> 
> We may want to add logic to the core PCIe code that clears the "Enable
> Relaxed Ordering" bit in the device control register for all devices
> hanging off of this root complex. Assuming the devices conform to the
> PCIe spec doing that should disable relaxed ordering in a device
> agnostic way that then enables us at a driver level to just enable the
> feature always without having to perform any checks for your flag. We
> could probably do that as a part of probing the PCIe interfaces
> hanging off of these devices.

I suppose you don't want to turn off RO completely on the device. When
traffic is targetted to mmio for peer to peer reasons RO has performance
upside. The specific issue with these root ports indicate limitation using 
RO for traffic targetting coherent memory.

> 
> > ---
> >  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci.h  |  2 ++
> >  2 files changed, 40 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index f754453..4ae78b3 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
> >                               quirk_tw686x_class);
> >
> >  /*
> > + * Some devices have problems with Transaction Layer Packets with the Relaxed
> > + * Ordering Attribute set.  Such devices should mark themselves and other
> > + * Device Drivers should check before sending TLPs with RO set.
> > + */
> > +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> > +{
> > +       dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> > +}
> > +
> > +/*
> > + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> > + * cause performance problems with Upstream Transaction Layer Packets with
> > + * Relaxed Ordering set.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +
> > +/*
> > + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> > + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> > + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> > + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> > + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> > + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> > + * Ordering for Upstream TLPs.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +
> > +/*
> >   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
> >   * values for the Attribute as were supplied in the header of the
> >   * corresponding Request, except as explicitly allowed when IDO is used."
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index eb3da1a..6764f66 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -178,6 +178,8 @@ enum pci_dev_flags {
> >         PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
> >         /* Get VPD from function 0 VPD */
> >         PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> > +       /* Don't use Relaxed Ordering for TLPs directed at this device */
> > +       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
> >  };
> >
> >  enum pci_irq_reroute_variant {
> > --
> > 1.9.1
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-02 16:53       ` Raj, Ashok
  0 siblings, 0 replies; 50+ messages in thread
From: Raj, Ashok @ 2017-05-02 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> > Ordering Attribute should not be used on Transaction Layer Packets destined
> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> 
> So this is a good first step though might I suggest one other change.
> 
> We may want to add logic to the core PCIe code that clears the "Enable
> Relaxed Ordering" bit in the device control register for all devices
> hanging off of this root complex. Assuming the devices conform to the
> PCIe spec doing that should disable relaxed ordering in a device
> agnostic way that then enables us at a driver level to just enable the
> feature always without having to perform any checks for your flag. We
> could probably do that as a part of probing the PCIe interfaces
> hanging off of these devices.

I suppose you don't want to turn off RO completely on the device. When
traffic is targetted to mmio for peer to peer reasons RO has performance
upside. The specific issue with these root ports indicate limitation using 
RO for traffic targetting coherent memory.

> 
> > ---
> >  drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci.h  |  2 ++
> >  2 files changed, 40 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index f754453..4ae78b3 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
> >                               quirk_tw686x_class);
> >
> >  /*
> > + * Some devices have problems with Transaction Layer Packets with the Relaxed
> > + * Ordering Attribute set.  Such devices should mark themselves and other
> > + * Device Drivers should check before sending TLPs with RO set.
> > + */
> > +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> > +{
> > +       dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> > +}
> > +
> > +/*
> > + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> > + * cause performance problems with Upstream Transaction Layer Packets with
> > + * Relaxed Ordering set.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +
> > +/*
> > + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> > + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> > + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> > + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> > + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> > + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> > + * Ordering for Upstream TLPs.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> > +                             quirk_relaxedordering_disable);
> > +
> > +/*
> >   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
> >   * values for the Attribute as were supplied in the header of the
> >   * corresponding Request, except as explicitly allowed when IDO is used."
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index eb3da1a..6764f66 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -178,6 +178,8 @@ enum pci_dev_flags {
> >         PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
> >         /* Get VPD from function 0 VPD */
> >         PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> > +       /* Don't use Relaxed Ordering for TLPs directed at this device */
> > +       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
> >  };
> >
> >  enum pci_irq_reroute_variant {
> > --
> > 1.9.1
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-02 16:53       ` Raj, Ashok
@ 2017-05-02 18:10         ` Alexander Duyck
  -1 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-02 18:10 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Casey Leedom, Bjorn Helgaas, leedom, Michael Werner,
	Ganesh Goudar, Arjun V, David Miller, Asit K Mallick,
	Patrick J Cramer, Suravee Suthikulpanit, Bob Shaw, h,
	Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight, Jeff

On Tue, May 2, 2017 at 9:53 AM, Raj, Ashok <ashok.raj@intel.com> wrote:
> On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
>> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
>> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
>> > Ordering Attribute should not be used on Transaction Layer Packets destined
>> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
>> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
>> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
>> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
>>
>> So this is a good first step though might I suggest one other change.
>>
>> We may want to add logic to the core PCIe code that clears the "Enable
>> Relaxed Ordering" bit in the device control register for all devices
>> hanging off of this root complex. Assuming the devices conform to the
>> PCIe spec doing that should disable relaxed ordering in a device
>> agnostic way that then enables us at a driver level to just enable the
>> feature always without having to perform any checks for your flag. We
>> could probably do that as a part of probing the PCIe interfaces
>> hanging off of these devices.
>
> I suppose you don't want to turn off RO completely on the device. When
> traffic is targetted to mmio for peer to peer reasons RO has performance
> upside. The specific issue with these root ports indicate limitation using
> RO for traffic targetting coherent memory.

Actually my main concern here is virtualization. If I take the PCIe
function and direct assign it I have no way of seeing the root complex
flag as it is now virtualized away. In the meantime the guest now has
the ability to enable the function and sees nothing that says you
can't enable relaxed ordering which in turn ends up potentially
causing data corruption on the system. I want relaxed ordering
disabled before I even consider assigning it to the guest on the
systems where this would be an issue.

I prefer to err on the side of caution with this. Enabling Relaxed
Ordering is technically a performance enhancement, so we function but
not as well as we would like, while having it enabled when there are
issues can lead to data corruption. I would weigh the risk of data
corruption the thing to be avoided and of much higher priority than
enabling improved performance. As such I say we should default the
relaxed ordering attribute to off in general and look at
"white-listing" it in for various architectures and/or chipsets that
support/need it rather than having it enabled by default and trying to
switch it off after the fact when we find some new issue.

So for example, in the case of x86 it seems like there are multiple
root complexes that have issues, and the gains for enabling it with
standard DMA to host memory are small. As such we may want to default
it to off via the architecture specific PCIe code and then look at
having "white-list" cases where we enable it for things like
peer-to-peer accesses. In the case of SPARC we could look at
defaulting it to on, and only "black-list" any cases where there might
be issues since SPARC relies on this in a significant way for
performance. In the case of ARM and other architectures it is a bit of
a toss-up. I would say we could just default it to on for now and
"black-list" anything that doesn't work, or we could go the other way
like I suggested for x86. It all depends on what the ARM community
would want to agree on for this. I would say unless it makes a
significant difference like it does in the case of SPARC we are
probably better off just defaulting it to off.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-02 18:10         ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-02 18:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 2, 2017 at 9:53 AM, Raj, Ashok <ashok.raj@intel.com> wrote:
> On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
>> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
>> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
>> > Ordering Attribute should not be used on Transaction Layer Packets destined
>> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
>> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
>> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
>> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
>>
>> So this is a good first step though might I suggest one other change.
>>
>> We may want to add logic to the core PCIe code that clears the "Enable
>> Relaxed Ordering" bit in the device control register for all devices
>> hanging off of this root complex. Assuming the devices conform to the
>> PCIe spec doing that should disable relaxed ordering in a device
>> agnostic way that then enables us at a driver level to just enable the
>> feature always without having to perform any checks for your flag. We
>> could probably do that as a part of probing the PCIe interfaces
>> hanging off of these devices.
>
> I suppose you don't want to turn off RO completely on the device. When
> traffic is targetted to mmio for peer to peer reasons RO has performance
> upside. The specific issue with these root ports indicate limitation using
> RO for traffic targetting coherent memory.

Actually my main concern here is virtualization. If I take the PCIe
function and direct assign it I have no way of seeing the root complex
flag as it is now virtualized away. In the meantime the guest now has
the ability to enable the function and sees nothing that says you
can't enable relaxed ordering which in turn ends up potentially
causing data corruption on the system. I want relaxed ordering
disabled before I even consider assigning it to the guest on the
systems where this would be an issue.

I prefer to err on the side of caution with this. Enabling Relaxed
Ordering is technically a performance enhancement, so we function but
not as well as we would like, while having it enabled when there are
issues can lead to data corruption. I would weigh the risk of data
corruption the thing to be avoided and of much higher priority than
enabling improved performance. As such I say we should default the
relaxed ordering attribute to off in general and look at
"white-listing" it in for various architectures and/or chipsets that
support/need it rather than having it enabled by default and trying to
switch it off after the fact when we find some new issue.

So for example, in the case of x86 it seems like there are multiple
root complexes that have issues, and the gains for enabling it with
standard DMA to host memory are small. As such we may want to default
it to off via the architecture specific PCIe code and then look at
having "white-list" cases where we enable it for things like
peer-to-peer accesses. In the case of SPARC we could look at
defaulting it to on, and only "black-list" any cases where there might
be issues since SPARC relies on this in a significant way for
performance. In the case of ARM and other architectures it is a bit of
a toss-up. I would say we could just default it to on for now and
"black-list" anything that doesn't work, or we could go the other way
like I suggested for x86. It all depends on what the ARM community
would want to agree on for this. I would say unless it makes a
significant difference like it does in the case of SPARC we are
probably better off just defaulting it to off.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-02 18:10         ` Alexander Duyck
  (?)
@ 2017-05-02 19:34         ` Raj, Ashok
  2017-05-02 22:41           ` Alexander Duyck
  -1 siblings, 1 reply; 50+ messages in thread
From: Raj, Ashok @ 2017-05-02 19:34 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Casey Leedom, Bjorn Helgaas, leedom, Michael Werner,
	Ganesh Goudar, Arjun V, David Miller, Asit K Mallick,
	Patrick J Cramer, Suravee Suthikulpanit, Bob Shaw, h,
	Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight, Jeff

On Tue, May 02, 2017 at 11:10:22AM -0700, Alexander Duyck wrote:
> On Tue, May 2, 2017 at 9:53 AM, Raj, Ashok <ashok.raj@intel.com> wrote:
> > On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
> >> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
> >> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> >> > Ordering Attribute should not be used on Transaction Layer Packets destined
> >> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
> >> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> >> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> >> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> >>
> >> So this is a good first step though might I suggest one other change.
> >>
> >> We may want to add logic to the core PCIe code that clears the "Enable
> >> Relaxed Ordering" bit in the device control register for all devices
> >> hanging off of this root complex. Assuming the devices conform to the
> >> PCIe spec doing that should disable relaxed ordering in a device
> >> agnostic way that then enables us at a driver level to just enable the
> >> feature always without having to perform any checks for your flag. We
> >> could probably do that as a part of probing the PCIe interfaces
> >> hanging off of these devices.
> >
> > I suppose you don't want to turn off RO completely on the device. When
> > traffic is targetted to mmio for peer to peer reasons RO has performance
> > upside. The specific issue with these root ports indicate limitation using
> > RO for traffic targetting coherent memory.
> 
> Actually my main concern here is virtualization. If I take the PCIe
> function and direct assign it I have no way of seeing the root complex
> flag as it is now virtualized away. In the meantime the guest now has
> the ability to enable the function and sees nothing that says you
> can't enable relaxed ordering which in turn ends up potentially
> causing data corruption on the system. I want relaxed ordering
> disabled before I even consider assigning it to the guest on the
> systems where this would be an issue.
> 
> I prefer to err on the side of caution with this. Enabling Relaxed
> Ordering is technically a performance enhancement, so we function but
> not as well as we would like, while having it enabled when there are
> issues can lead to data corruption. I would weigh the risk of data
> corruption the thing to be avoided and of much higher priority than
> enabling improved performance. As such I say we should default the
> relaxed ordering attribute to off in general and look at
> "white-listing" it in for various architectures and/or chipsets that
> support/need it rather than having it enabled by default and trying to
> switch it off after the fact when we find some new issue.

I agree, after thinking about it a bit more.. even for transactions going to
p2p, i'm just reading the pcie spec and some sections aren't super clear
about completion redirect and ACS rules for p2p.

Also it appears the device control default value is 1 for enabling
Relaxed Ordering. This means we should probably save these states across
resets/FLR for e.g. To ensure perf isn't affected after a FLR.
> 
> So for example, in the case of x86 it seems like there are multiple
> root complexes that have issues, and the gains for enabling it with
> standard DMA to host memory are small. As such we may want to default
> it to off via the architecture specific PCIe code and then look at
> having "white-list" cases where we enable it for things like
> peer-to-peer accesses. In the case of SPARC we could look at
> defaulting it to on, and only "black-list" any cases where there might
> be issues since SPARC relies on this in a significant way for
> performance. In the case of ARM and other architectures it is a bit of
> a toss-up. I would say we could just default it to on for now and
> "black-list" anything that doesn't work, or we could go the other way
> like I suggested for x86. It all depends on what the ARM community
> would want to agree on for this. I would say unless it makes a
> significant difference like it does in the case of SPARC we are
> probably better off just defaulting it to off.
> 
> - Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-02 19:34         ` Raj, Ashok
@ 2017-05-02 22:41           ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-02 22:41 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Casey Leedom, Bjorn Helgaas, Casey Leedom, Michael Werner,
	Ganesh Goudar, Arjun V, David Miller, Asit K Mallick,
	Patrick J Cramer, Suravee Suthikulpanit, Bob Shaw, h,
	Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight

On Tue, May 2, 2017 at 12:34 PM, Raj, Ashok <ashok.raj@intel.com> wrote:
> On Tue, May 02, 2017 at 11:10:22AM -0700, Alexander Duyck wrote:
>> On Tue, May 2, 2017 at 9:53 AM, Raj, Ashok <ashok.raj@intel.com> wrote:
>> > On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
>> >> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom <leedom@chelsio.com> wrote:
>> >> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
>> >> > Ordering Attribute should not be used on Transaction Layer Packets destined
>> >> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
>> >> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
>> >> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
>> >> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
>> >>
>> >> So this is a good first step though might I suggest one other change.
>> >>
>> >> We may want to add logic to the core PCIe code that clears the "Enable
>> >> Relaxed Ordering" bit in the device control register for all devices
>> >> hanging off of this root complex. Assuming the devices conform to the
>> >> PCIe spec doing that should disable relaxed ordering in a device
>> >> agnostic way that then enables us at a driver level to just enable the
>> >> feature always without having to perform any checks for your flag. We
>> >> could probably do that as a part of probing the PCIe interfaces
>> >> hanging off of these devices.
>> >
>> > I suppose you don't want to turn off RO completely on the device. When
>> > traffic is targetted to mmio for peer to peer reasons RO has performance
>> > upside. The specific issue with these root ports indicate limitation using
>> > RO for traffic targetting coherent memory.
>>
>> Actually my main concern here is virtualization. If I take the PCIe
>> function and direct assign it I have no way of seeing the root complex
>> flag as it is now virtualized away. In the meantime the guest now has
>> the ability to enable the function and sees nothing that says you
>> can't enable relaxed ordering which in turn ends up potentially
>> causing data corruption on the system. I want relaxed ordering
>> disabled before I even consider assigning it to the guest on the
>> systems where this would be an issue.
>>
>> I prefer to err on the side of caution with this. Enabling Relaxed
>> Ordering is technically a performance enhancement, so we function but
>> not as well as we would like, while having it enabled when there are
>> issues can lead to data corruption. I would weigh the risk of data
>> corruption the thing to be avoided and of much higher priority than
>> enabling improved performance. As such I say we should default the
>> relaxed ordering attribute to off in general and look at
>> "white-listing" it in for various architectures and/or chipsets that
>> support/need it rather than having it enabled by default and trying to
>> switch it off after the fact when we find some new issue.
>
> I agree, after thinking about it a bit more.. even for transactions going to
> p2p, i'm just reading the pcie spec and some sections aren't super clear
> about completion redirect and ACS rules for p2p.
>
> Also it appears the device control default value is 1 for enabling
> Relaxed Ordering. This means we should probably save these states across
> resets/FLR for e.g. To ensure perf isn't affected after a FLR.

Right. That should happen automatically with the PCIe configuration
being saved/restored.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-02 18:10         ` Alexander Duyck
@ 2017-05-03  4:30           ` Casey Leedom
  -1 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-03  4:30 UTC (permalink / raw)
  To: Alexander Duyck, Raj, Ashok
  Cc: Mark Rutland, Gabriele Paoloni, Asit K Mallick, Catalin Marinas,
	Will Deacon, LinuxArm, Amir Ancel, Bjorn Helgaas, Ganesh GR,
	Ding Tianhong, Jeff Kirsher, leedom, Bob Shaw, Patrick J Cramer,
	Arjun V.,
	Michael Werner, linux-arm-kernel, Netdev, David Laight,
	Suravee Suthikulpanit, Robin Murphy, David Miller

| From: Alexander Duyck <alexander.duyck@gmail.com>
| Date: Tuesday, May 2, 2017 11:10 AM
| ...
| So for example, in the case of x86 it seems like there are multiple
| root complexes that have issues, and the gains for enabling it with
| standard DMA to host memory are small. As such we may want to default
| it to off via the architecture specific PCIe code and then look at
| having "white-list" cases where we enable it for things like
| peer-to-peer accesses. In the case of SPARC we could look at
| defaulting it to on, and only "black-list" any cases where there might
| be issues since SPARC relies on this in a significant way for
| performance. In the case of ARM and other architectures it is a bit of
| a toss-up. I would say we could just default it to on for now and
| "black-list" anything that doesn't work, or we could go the other way
| like I suggested for x86. It all depends on what the ARM community
| would want to agree on for this. I would say unless it makes a
| significant difference like it does in the case of SPARC we are
| probably better off just defaulting it to off.

  Sorry, I forgot to respond to this earlier when someone was rushing me out
to a customer dinner.

  I'm unaware of any other x86 Root Complex Port that has a problem with
Relaxed Ordering other than the performance issue with the current Intel
implementation.  Ashok tells me that Intel is in the final stages of putting
together a technical notice on this issue but I don't know when that will
come out.  Hopefully that will shed much more light on the cause and actual
use of Relaxed Ordering when directed to Coherent Memory on current and past
Intel platforms.  (Note that the performance bug seems to limit us to
~75-85Gb/s DMA Write speed to Coherent Host Memory.)

  The only other Device that I currently know of which has issues with
Relaxed Ordering TLPs directed at it, is also a Root Complex Port on the new
AMD A1100 ARM SoC ("SEATTLE").  There we have an actual bug which could lead
to Data Corruption.

  But I don't know anything about other x86 Root Complex Ports having issues
with this -- we've been using it ever since commit ef306b50b from December
2010.

  Also, I'm not aware of any performance data which has been gathered on the
use of Relaxed Ordering when directed at Host Memory.  From your note, it
sounds like it's important on SPARC architectures.  But it could conceivably
be important on any architecture or Root Complex/Memory Controller
implementation.  We use it to direct Ingress Packet Data to Free List
Buffers, followed by a TLP without Relaxed Ordering directed at a Host
Memory Message Queue.  The idea here is to give the Root Complex options on
which DMA Memory Write TLPs to process in order to optimize data placement
in memory.  And with the next Ethernet speed bump to 200Gb/s this may be
even more important.

  Basically, I'd hate to come up with a solution where we write off the use
of Relaxed Ordering for DMA Writes to Host Memory.  I don't think you're
suggesting that, but there are a number of x86 Root Complex implementations
out there -- and some like the new AMD Ryzen have yet to be tested -- as
well as other architectures.

  And, as Ashok and I have both nothed, any solution we come up with needs
to cope with a heterogeneous situation where, on the same PCIe Fabric, it
may be necessesary/desireable to support Relaxed Ordering TLPs directed at
some End Nodes but not others.

Casey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-03  4:30           ` Casey Leedom
  0 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-03  4:30 UTC (permalink / raw)
  To: linux-arm-kernel

| From: Alexander Duyck <alexander.duyck@gmail.com>
| Date: Tuesday, May 2, 2017 11:10 AM
| ...
| So for example, in the case of x86 it seems like there are multiple
| root complexes that have issues, and the gains for enabling it with
| standard DMA to host memory are small. As such we may want to default
| it to off via the architecture specific PCIe code and then look at
| having "white-list" cases where we enable it for things like
| peer-to-peer accesses. In the case of SPARC we could look at
| defaulting it to on, and only "black-list" any cases where there might
| be issues since SPARC relies on this in a significant way for
| performance. In the case of ARM and other architectures it is a bit of
| a toss-up. I would say we could just default it to on for now and
| "black-list" anything that doesn't work, or we could go the other way
| like I suggested for x86. It all depends on what the ARM community
| would want to agree on for this. I would say unless it makes a
| significant difference like it does in the case of SPARC we are
| probably better off just defaulting it to off.

  Sorry, I forgot to respond to this earlier when someone was rushing me out
to a customer dinner.

  I'm unaware of any other x86 Root Complex Port that has a problem with
Relaxed Ordering other than the performance issue with the current Intel
implementation.  Ashok tells me that Intel is in the final stages of putting
together a technical notice on this issue but I don't know when that will
come out.  Hopefully that will shed much more light on the cause and actual
use of Relaxed Ordering when directed to Coherent Memory on current and past
Intel platforms.  (Note that the performance bug seems to limit us to
~75-85Gb/s DMA Write speed to Coherent Host Memory.)

  The only other Device that I currently know of which has issues with
Relaxed Ordering TLPs directed at it, is also a Root Complex Port on the new
AMD A1100 ARM SoC ("SEATTLE").  There we have an actual bug which could lead
to Data Corruption.

  But I don't know anything about other x86 Root Complex Ports having issues
with this -- we've been using it ever since commit ef306b50b from December
2010.

  Also, I'm not aware of any performance data which has been gathered on the
use of Relaxed Ordering when directed at Host Memory.  From your note, it
sounds like it's important on SPARC architectures.  But it could conceivably
be important on any architecture or Root Complex/Memory Controller
implementation.  We use it to direct Ingress Packet Data to Free List
Buffers, followed by a TLP without Relaxed Ordering directed at a Host
Memory Message Queue.  The idea here is to give the Root Complex options on
which DMA Memory Write TLPs to process in order to optimize data placement
in memory.  And with the next Ethernet speed bump to 200Gb/s this may be
even more important.

  Basically, I'd hate to come up with a solution where we write off the use
of Relaxed Ordering for DMA Writes to Host Memory.  I don't think you're
suggesting that, but there are a number of x86 Root Complex implementations
out there -- and some like the new AMD Ryzen have yet to be tested -- as
well as other architectures.

  And, as Ashok and I have both nothed, any solution we come up with needs
to cope with a heterogeneous situation where, on the same PCIe Fabric, it
may be necessesary/desireable to support Relaxed Ordering TLPs directed at
some End Nodes but not others.

Casey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-03  4:30           ` Casey Leedom
@ 2017-05-03 16:02             ` Alexander Duyck
  -1 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-03 16:02 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Raj, Ashok, Bjorn Helgaas, leedom, Michael Werner, Ganesh GR,
	Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit,
	Bob Shaw, h, Ding Tianhong, Mark Rutland, Amir Ancel,
	Gabriele Paoloni, Catalin Marinas, Will Deacon, LinuxArm,
	David Laight, Jeff Kirsher

On Tue, May 2, 2017 at 9:30 PM, Casey Leedom <leedom@chelsio.com> wrote:
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Tuesday, May 2, 2017 11:10 AM
> | ...
> | So for example, in the case of x86 it seems like there are multiple
> | root complexes that have issues, and the gains for enabling it with
> | standard DMA to host memory are small. As such we may want to default
> | it to off via the architecture specific PCIe code and then look at
> | having "white-list" cases where we enable it for things like
> | peer-to-peer accesses. In the case of SPARC we could look at
> | defaulting it to on, and only "black-list" any cases where there might
> | be issues since SPARC relies on this in a significant way for
> | performance. In the case of ARM and other architectures it is a bit of
> | a toss-up. I would say we could just default it to on for now and
> | "black-list" anything that doesn't work, or we could go the other way
> | like I suggested for x86. It all depends on what the ARM community
> | would want to agree on for this. I would say unless it makes a
> | significant difference like it does in the case of SPARC we are
> | probably better off just defaulting it to off.
>
>   Sorry, I forgot to respond to this earlier when someone was rushing me out
> to a customer dinner.
>
>   I'm unaware of any other x86 Root Complex Port that has a problem with
> Relaxed Ordering other than the performance issue with the current Intel
> implementation.  Ashok tells me that Intel is in the final stages of putting
> together a technical notice on this issue but I don't know when that will
> come out.  Hopefully that will shed much more light on the cause and actual
> use of Relaxed Ordering when directed to Coherent Memory on current and past
> Intel platforms.  (Note that the performance bug seems to limit us to
> ~75-85Gb/s DMA Write speed to Coherent Host Memory.)

So my concern isn't so much about existing issues as it is about where
is the advantage in enabling it. We have had support in the Intel
hardware for enabling relaxed ordering for about 10 years. In all that
time I have yet to see an x86 platform that sees any real benefit from
enabling it for standard DMA. That is why my preference would be to
leave it disabled by default on x86 and we white list it in at some
point when hardware shows that there is a benefit to be had for
enabling it.

>   The only other Device that I currently know of which has issues with
> Relaxed Ordering TLPs directed at it, is also a Root Complex Port on the new
> AMD A1100 ARM SoC ("SEATTLE").  There we have an actual bug which could lead
> to Data Corruption.
>
>   But I don't know anything about other x86 Root Complex Ports having issues
> with this -- we've been using it ever since commit ef306b50b from December
> 2010.

So the question I would have for you then, is what benefits have you
seen from enabling it on x86? In our case we haven't seen any for
transactions that go through the root complex. If you are seeing
benefits would I be correct in assuming it is for your peer-to-peer
case or were there some x86 platforms that showed gains?

>   Also, I'm not aware of any performance data which has been gathered on the
> use of Relaxed Ordering when directed at Host Memory.  From your note, it
> sounds like it's important on SPARC architectures.  But it could conceivably
> be important on any architecture or Root Complex/Memory Controller
> implementation.  We use it to direct Ingress Packet Data to Free List
> Buffers, followed by a TLP without Relaxed Ordering directed at a Host
> Memory Message Queue.  The idea here is to give the Root Complex options on
> which DMA Memory Write TLPs to process in order to optimize data placement
> in memory.  And with the next Ethernet speed bump to 200Gb/s this may be
> even more important.

The operative term here is "may be". That is my concern. We are
leaving this enabled by default and there are known hardware that have
issues that can be pretty serious. I'm not saying we have to disable
it and keep it disabled, but I would like to see us intelligently
enable this feature so that it is enabled on the platforms that show
benefit and disabled on the ones that don't.

My biggest concern with all this is introducing regressions as drivers
like igb and ixgbe are used on a wide range of platforms beyond even
what is covered by x86 and I would prefer not to suddenly have a
deluge of bugs to sort out triggered by us enabling relaxed ordering
on platforms that historically have not had it enabled on them.

>   Basically, I'd hate to come up with a solution where we write off the use
> of Relaxed Ordering for DMA Writes to Host Memory.  I don't think you're
> suggesting that, but there are a number of x86 Root Complex implementations
> out there -- and some like the new AMD Ryzen have yet to be tested -- as
> well as other architectures.

I'm not saying that we have to write off the use of Relaxed Ordering,
what I am saying is that we should probably be more judicious about
how we go about enabling it. If a platform and/or architecture has no
benefit to enabling it what is the point in adding the possible risk?

Hopefully the AMD Ryzen platform has already been tested and doesn't
need a quirk to disable relaxed ordering. Really it shouldn't fall on
the likes of us to be testing for those kind of things.

>   And, as Ashok and I have both nothed, any solution we come up with needs
> to cope with a heterogeneous situation where, on the same PCIe Fabric, it
> may be necessesary/desireable to support Relaxed Ordering TLPs directed at
> some End Nodes but not others.
>
> Casey

It sounds like we are more or less in agreement. My only concern is
really what we default this to. On x86 I would say we could probably
default this to disabled for existing platforms since my understanding
is that relaxed ordering doesn't provide much benefit on what is out
there right now when performing DMA through the root complex. As far
as peer-to-peer I would say we should probably look at enabling the
ability to have Relaxed Ordering enabled for some channels but not
others. In those cases the hardware needs to be smart enough to allow
for you to indicate you want it disabled by default for most of your
DMA channels, and then enabled for the select channels that are
handling the peer-to-peer traffic.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-03 16:02             ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-03 16:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 2, 2017 at 9:30 PM, Casey Leedom <leedom@chelsio.com> wrote:
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Tuesday, May 2, 2017 11:10 AM
> | ...
> | So for example, in the case of x86 it seems like there are multiple
> | root complexes that have issues, and the gains for enabling it with
> | standard DMA to host memory are small. As such we may want to default
> | it to off via the architecture specific PCIe code and then look at
> | having "white-list" cases where we enable it for things like
> | peer-to-peer accesses. In the case of SPARC we could look at
> | defaulting it to on, and only "black-list" any cases where there might
> | be issues since SPARC relies on this in a significant way for
> | performance. In the case of ARM and other architectures it is a bit of
> | a toss-up. I would say we could just default it to on for now and
> | "black-list" anything that doesn't work, or we could go the other way
> | like I suggested for x86. It all depends on what the ARM community
> | would want to agree on for this. I would say unless it makes a
> | significant difference like it does in the case of SPARC we are
> | probably better off just defaulting it to off.
>
>   Sorry, I forgot to respond to this earlier when someone was rushing me out
> to a customer dinner.
>
>   I'm unaware of any other x86 Root Complex Port that has a problem with
> Relaxed Ordering other than the performance issue with the current Intel
> implementation.  Ashok tells me that Intel is in the final stages of putting
> together a technical notice on this issue but I don't know when that will
> come out.  Hopefully that will shed much more light on the cause and actual
> use of Relaxed Ordering when directed to Coherent Memory on current and past
> Intel platforms.  (Note that the performance bug seems to limit us to
> ~75-85Gb/s DMA Write speed to Coherent Host Memory.)

So my concern isn't so much about existing issues as it is about where
is the advantage in enabling it. We have had support in the Intel
hardware for enabling relaxed ordering for about 10 years. In all that
time I have yet to see an x86 platform that sees any real benefit from
enabling it for standard DMA. That is why my preference would be to
leave it disabled by default on x86 and we white list it in at some
point when hardware shows that there is a benefit to be had for
enabling it.

>   The only other Device that I currently know of which has issues with
> Relaxed Ordering TLPs directed at it, is also a Root Complex Port on the new
> AMD A1100 ARM SoC ("SEATTLE").  There we have an actual bug which could lead
> to Data Corruption.
>
>   But I don't know anything about other x86 Root Complex Ports having issues
> with this -- we've been using it ever since commit ef306b50b from December
> 2010.

So the question I would have for you then, is what benefits have you
seen from enabling it on x86? In our case we haven't seen any for
transactions that go through the root complex. If you are seeing
benefits would I be correct in assuming it is for your peer-to-peer
case or were there some x86 platforms that showed gains?

>   Also, I'm not aware of any performance data which has been gathered on the
> use of Relaxed Ordering when directed at Host Memory.  From your note, it
> sounds like it's important on SPARC architectures.  But it could conceivably
> be important on any architecture or Root Complex/Memory Controller
> implementation.  We use it to direct Ingress Packet Data to Free List
> Buffers, followed by a TLP without Relaxed Ordering directed at a Host
> Memory Message Queue.  The idea here is to give the Root Complex options on
> which DMA Memory Write TLPs to process in order to optimize data placement
> in memory.  And with the next Ethernet speed bump to 200Gb/s this may be
> even more important.

The operative term here is "may be". That is my concern. We are
leaving this enabled by default and there are known hardware that have
issues that can be pretty serious. I'm not saying we have to disable
it and keep it disabled, but I would like to see us intelligently
enable this feature so that it is enabled on the platforms that show
benefit and disabled on the ones that don't.

My biggest concern with all this is introducing regressions as drivers
like igb and ixgbe are used on a wide range of platforms beyond even
what is covered by x86 and I would prefer not to suddenly have a
deluge of bugs to sort out triggered by us enabling relaxed ordering
on platforms that historically have not had it enabled on them.

>   Basically, I'd hate to come up with a solution where we write off the use
> of Relaxed Ordering for DMA Writes to Host Memory.  I don't think you're
> suggesting that, but there are a number of x86 Root Complex implementations
> out there -- and some like the new AMD Ryzen have yet to be tested -- as
> well as other architectures.

I'm not saying that we have to write off the use of Relaxed Ordering,
what I am saying is that we should probably be more judicious about
how we go about enabling it. If a platform and/or architecture has no
benefit to enabling it what is the point in adding the possible risk?

Hopefully the AMD Ryzen platform has already been tested and doesn't
need a quirk to disable relaxed ordering. Really it shouldn't fall on
the likes of us to be testing for those kind of things.

>   And, as Ashok and I have both nothed, any solution we come up with needs
> to cope with a heterogeneous situation where, on the same PCIe Fabric, it
> may be necessesary/desireable to support Relaxed Ordering TLPs directed at
> some End Nodes but not others.
>
> Casey

It sounds like we are more or less in agreement. My only concern is
really what we default this to. On x86 I would say we could probably
default this to disabled for existing platforms since my understanding
is that relaxed ordering doesn't provide much benefit on what is out
there right now when performing DMA through the root complex. As far
as peer-to-peer I would say we should probably look at enabling the
ability to have Relaxed Ordering enabled for some channels but not
others. In those cases the hardware needs to be smart enough to allow
for you to indicate you want it disabled by default for most of your
DMA channels, and then enabled for the select channels that are
handling the peer-to-peer traffic.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-03 16:02             ` Alexander Duyck
@ 2017-05-04 21:01               ` Casey Leedom
  -1 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-04 21:01 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Raj, Ashok, Bjorn Helgaas, Michael Werner, Ganesh GR, Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit,
	Bob Shaw, h, Ding Tianhong, Mark Rutland, Amir Ancel,
	Gabriele Paoloni, Catalin Marinas, Will Deacon, LinuxArm,
	David Laight, Jeff Kirsher, Netdev

| From: Alexander Duyck <alexander.duyck@gmail.com>
| Sent: Wednesday, May 3, 2017 9:02 AM
| ...
| It sounds like we are more or less in agreement. My only concern is
| really what we default this to. On x86 I would say we could probably
| default this to disabled for existing platforms since my understanding
| is that relaxed ordering doesn't provide much benefit on what is out
| there right now when performing DMA through the root complex. As far
| as peer-to-peer I would say we should probably look at enabling the
| ability to have Relaxed Ordering enabled for some channels but not
| others. In those cases the hardware needs to be smart enough to allow
| for you to indicate you want it disabled by default for most of your
| DMA channels, and then enabled for the select channels that are
| handling the peer-to-peer traffic.

  Yes, I think that we are mostly in agreement.  I had just wanted to make
sure that whatever scheme was developed would allow for simultaneously
supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
Ordering for others within the same system.  I.e. not simply
enabling/disabling/etc.  based solely on System Platform Architecture.

  By the way, I've started our QA folks off looking at what things look like
in Linux Virtual Machines under different Hypervisors to see what
information they may provide to the VM in the way of what Root Complex Port
is being used, etc.  So far they've got Windows HyperV done and there
there's no PCIe Fabric exposed in any way: just the attached device.  I'll
have to see what pci_find_pcie_root_port() returns in that environment.
Maybe NULL?

  With your reservations (which I also share), I think that it probably
makes sense to have a per-architecture definition of the "Can I Use Relaxed
Ordering With TLPs Directed At This End Point" predicate, with the default
being "No" for any architecture which doesn't implement the predicate.  And
if the specified (struct pci_dev *) End Node is NULL, it ought to return
False for that as well.  I can't see any reason to pass in the Source End
Node but I may be missing something.

  At this point, this is pretty far outside my level of expertise.  I'm
happy to give it a go, but I'd be even happier if someone with a lot more
experience in the PCIe Infrastructure were to want to carry the ball
forward.  I'm not super familiar with the Linux Kernel "Rules Of
Engagement", so let me know what my next step should be.  Thanks.

Casey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-04 21:01               ` Casey Leedom
  0 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-04 21:01 UTC (permalink / raw)
  To: linux-arm-kernel

| From: Alexander Duyck <alexander.duyck@gmail.com>
| Sent: Wednesday, May 3, 2017 9:02 AM
| ...
| It sounds like we are more or less in agreement. My only concern is
| really what we default this to. On x86 I would say we could probably
| default this to disabled for existing platforms since my understanding
| is that relaxed ordering doesn't provide much benefit on what is out
| there right now when performing DMA through the root complex. As far
| as peer-to-peer I would say we should probably look at enabling the
| ability to have Relaxed Ordering enabled for some channels but not
| others. In those cases the hardware needs to be smart enough to allow
| for you to indicate you want it disabled by default for most of your
| DMA channels, and then enabled for the select channels that are
| handling the peer-to-peer traffic.

  Yes, I think that we are mostly in agreement.  I had just wanted to make
sure that whatever scheme was developed would allow for simultaneously
supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
Ordering for others within the same system.  I.e. not simply
enabling/disabling/etc.  based solely on System Platform Architecture.

  By the way, I've started our QA folks off looking at what things look like
in Linux Virtual Machines under different Hypervisors to see what
information they may provide to the VM in the way of what Root Complex Port
is being used, etc.  So far they've got Windows HyperV done and there
there's no PCIe Fabric exposed in any way: just the attached device.  I'll
have to see what pci_find_pcie_root_port() returns in that environment.
Maybe NULL?

  With your reservations (which I also share), I think that it probably
makes sense to have a per-architecture definition of the "Can I Use Relaxed
Ordering With TLPs Directed At This End Point" predicate, with the default
being "No" for any architecture which doesn't implement the predicate.  And
if the specified (struct pci_dev *) End Node is NULL, it ought to return
False for that as well.  I can't see any reason to pass in the Source End
Node but I may be missing something.

  At this point, this is pretty far outside my level of expertise.  I'm
happy to give it a go, but I'd be even happier if someone with a lot more
experience in the PCIe Infrastructure were to want to carry the ball
forward.  I'm not super familiar with the Linux Kernel "Rules Of
Engagement", so let me know what my next step should be.  Thanks.

Casey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-04 21:01               ` Casey Leedom
@ 2017-05-05 14:04                 ` Alexander Duyck
  -1 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-05 14:04 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Raj, Ashok, Bjorn Helgaas, Michael Werner, Ganesh GR, Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit,
	Bob Shaw, h, Ding Tianhong, Mark Rutland, Amir Ancel,
	Gabriele Paoloni, Catalin Marinas, Will Deacon, LinuxArm,
	David Laight, Jeff Kirsher, Netdev

On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Sent: Wednesday, May 3, 2017 9:02 AM
> | ...
> | It sounds like we are more or less in agreement. My only concern is
> | really what we default this to. On x86 I would say we could probably
> | default this to disabled for existing platforms since my understanding
> | is that relaxed ordering doesn't provide much benefit on what is out
> | there right now when performing DMA through the root complex. As far
> | as peer-to-peer I would say we should probably look at enabling the
> | ability to have Relaxed Ordering enabled for some channels but not
> | others. In those cases the hardware needs to be smart enough to allow
> | for you to indicate you want it disabled by default for most of your
> | DMA channels, and then enabled for the select channels that are
> | handling the peer-to-peer traffic.
>
>   Yes, I think that we are mostly in agreement.  I had just wanted to make
> sure that whatever scheme was developed would allow for simultaneously
> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
> Ordering for others within the same system.  I.e. not simply
> enabling/disabling/etc.  based solely on System Platform Architecture.
>
>   By the way, I've started our QA folks off looking at what things look like
> in Linux Virtual Machines under different Hypervisors to see what
> information they may provide to the VM in the way of what Root Complex Port
> is being used, etc.  So far they've got Windows HyperV done and there
> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
> have to see what pci_find_pcie_root_port() returns in that environment.
> Maybe NULL?

I believe NULL is one of the options. It all depends on what qemu is
emulating. Most likely you won't find a pcie root port on KVM because
the default is to emulate an older system that only supports PCI.

>   With your reservations (which I also share), I think that it probably
> makes sense to have a per-architecture definition of the "Can I Use Relaxed
> Ordering With TLPs Directed At This End Point" predicate, with the default
> being "No" for any architecture which doesn't implement the predicate.  And
> if the specified (struct pci_dev *) End Node is NULL, it ought to return
> False for that as well.  I can't see any reason to pass in the Source End
> Node but I may be missing something.
>
>   At this point, this is pretty far outside my level of expertise.  I'm
> happy to give it a go, but I'd be even happier if someone with a lot more
> experience in the PCIe Infrastructure were to want to carry the ball
> forward.  I'm not super familiar with the Linux Kernel "Rules Of
> Engagement", so let me know what my next step should be.  Thanks.
>
> Casey

For now we can probably keep this on the linux-pci mailing list. Going
that route is the most straight forward for now since step one is
probably just making sure we are setting the relaxed ordering bit in
the setups that make sense. I would say we could probably keep it
simple. We just need to enable relaxed ordering by default for SPARC
architectures, on most others we can probably default it to off.

I believe this all had started as Ding Tianhong was hoping to enable
this for the ARM architecture. That is the only one I can think of
where it might be difficult to figure out which way to default as we
were attempting to follow the same code that was enabled for SPARC and
that is what started this tug-of-war about how this should be done.
What we might do is take care of this in two phases. The first one
enables the infrastructure generically but leaves it defaulted to off
for everyone but SPARC. Then we can go through and start enabling it
for other platforms such as some of those on ARM in the platforms that
Ding Tianhong was working with.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-05 14:04                 ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-05 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Sent: Wednesday, May 3, 2017 9:02 AM
> | ...
> | It sounds like we are more or less in agreement. My only concern is
> | really what we default this to. On x86 I would say we could probably
> | default this to disabled for existing platforms since my understanding
> | is that relaxed ordering doesn't provide much benefit on what is out
> | there right now when performing DMA through the root complex. As far
> | as peer-to-peer I would say we should probably look at enabling the
> | ability to have Relaxed Ordering enabled for some channels but not
> | others. In those cases the hardware needs to be smart enough to allow
> | for you to indicate you want it disabled by default for most of your
> | DMA channels, and then enabled for the select channels that are
> | handling the peer-to-peer traffic.
>
>   Yes, I think that we are mostly in agreement.  I had just wanted to make
> sure that whatever scheme was developed would allow for simultaneously
> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
> Ordering for others within the same system.  I.e. not simply
> enabling/disabling/etc.  based solely on System Platform Architecture.
>
>   By the way, I've started our QA folks off looking at what things look like
> in Linux Virtual Machines under different Hypervisors to see what
> information they may provide to the VM in the way of what Root Complex Port
> is being used, etc.  So far they've got Windows HyperV done and there
> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
> have to see what pci_find_pcie_root_port() returns in that environment.
> Maybe NULL?

I believe NULL is one of the options. It all depends on what qemu is
emulating. Most likely you won't find a pcie root port on KVM because
the default is to emulate an older system that only supports PCI.

>   With your reservations (which I also share), I think that it probably
> makes sense to have a per-architecture definition of the "Can I Use Relaxed
> Ordering With TLPs Directed At This End Point" predicate, with the default
> being "No" for any architecture which doesn't implement the predicate.  And
> if the specified (struct pci_dev *) End Node is NULL, it ought to return
> False for that as well.  I can't see any reason to pass in the Source End
> Node but I may be missing something.
>
>   At this point, this is pretty far outside my level of expertise.  I'm
> happy to give it a go, but I'd be even happier if someone with a lot more
> experience in the PCIe Infrastructure were to want to carry the ball
> forward.  I'm not super familiar with the Linux Kernel "Rules Of
> Engagement", so let me know what my next step should be.  Thanks.
>
> Casey

For now we can probably keep this on the linux-pci mailing list. Going
that route is the most straight forward for now since step one is
probably just making sure we are setting the relaxed ordering bit in
the setups that make sense. I would say we could probably keep it
simple. We just need to enable relaxed ordering by default for SPARC
architectures, on most others we can probably default it to off.

I believe this all had started as Ding Tianhong was hoping to enable
this for the ARM architecture. That is the only one I can think of
where it might be difficult to figure out which way to default as we
were attempting to follow the same code that was enabled for SPARC and
that is what started this tug-of-war about how this should be done.
What we might do is take care of this in two phases. The first one
enables the infrastructure generically but leaves it defaulted to off
for everyone but SPARC. Then we can go through and start enabling it
for other platforms such as some of those on ARM in the platforms that
Ding Tianhong was working with.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-05 14:04                 ` Alexander Duyck
@ 2017-05-06  3:08                   ` Ding Tianhong
  -1 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-06  3:08 UTC (permalink / raw)
  To: Alexander Duyck, Casey Leedom
  Cc: Mark Rutland, Gabriele Paoloni, Asit K Mallick, Catalin Marinas,
	Will Deacon, LinuxArm, Raj, Ashok, Bjorn Helgaas, Ganesh GR,
	Jeff Kirsher, Bob Shaw, Patrick J Cramer, Arjun V.,
	Michael Werner, linux-arm-kernel, Amir Ancel, Netdev,
	David Laight, Suravee Suthikulpanit, Robin Murphy, David Miller,
	h



On 2017/5/5 22:04, Alexander Duyck wrote:
> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>> | Sent: Wednesday, May 3, 2017 9:02 AM
>> | ...
>> | It sounds like we are more or less in agreement. My only concern is
>> | really what we default this to. On x86 I would say we could probably
>> | default this to disabled for existing platforms since my understanding
>> | is that relaxed ordering doesn't provide much benefit on what is out
>> | there right now when performing DMA through the root complex. As far
>> | as peer-to-peer I would say we should probably look at enabling the
>> | ability to have Relaxed Ordering enabled for some channels but not
>> | others. In those cases the hardware needs to be smart enough to allow
>> | for you to indicate you want it disabled by default for most of your
>> | DMA channels, and then enabled for the select channels that are
>> | handling the peer-to-peer traffic.
>>
>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>> sure that whatever scheme was developed would allow for simultaneously
>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>> Ordering for others within the same system.  I.e. not simply
>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>
>>   By the way, I've started our QA folks off looking at what things look like
>> in Linux Virtual Machines under different Hypervisors to see what
>> information they may provide to the VM in the way of what Root Complex Port
>> is being used, etc.  So far they've got Windows HyperV done and there
>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>> have to see what pci_find_pcie_root_port() returns in that environment.
>> Maybe NULL?
> 
> I believe NULL is one of the options. It all depends on what qemu is
> emulating. Most likely you won't find a pcie root port on KVM because
> the default is to emulate an older system that only supports PCI.
> 
>>   With your reservations (which I also share), I think that it probably
>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>> Ordering With TLPs Directed At This End Point" predicate, with the default
>> being "No" for any architecture which doesn't implement the predicate.  And
>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>> False for that as well.  I can't see any reason to pass in the Source End
>> Node but I may be missing something.
>>
>>   At this point, this is pretty far outside my level of expertise.  I'm
>> happy to give it a go, but I'd be even happier if someone with a lot more
>> experience in the PCIe Infrastructure were to want to carry the ball
>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>> Engagement", so let me know what my next step should be.  Thanks.
>>
>> Casey
> 
> For now we can probably keep this on the linux-pci mailing list. Going
> that route is the most straight forward for now since step one is
> probably just making sure we are setting the relaxed ordering bit in
> the setups that make sense. I would say we could probably keep it
> simple. We just need to enable relaxed ordering by default for SPARC
> architectures, on most others we can probably default it to off.
> 

Casey, Alexander:

Thanks for the wonderful discussion, it is more clearly that what to do next,
I agree that enable relaxed ordering by default only for SPARC and ARM64
is more safe for all the other platform, as no one want to break anything.

> I believe this all had started as Ding Tianhong was hoping to enable
> this for the ARM architecture. That is the only one I can think of
> where it might be difficult to figure out which way to default as we
> were attempting to follow the same code that was enabled for SPARC and
> that is what started this tug-of-war about how this should be done.
> What we might do is take care of this in two phases. The first one
> enables the infrastructure generically but leaves it defaulted to off
> for everyone but SPARC. Then we can go through and start enabling it
> for other platforms such as some of those on ARM in the platforms that
> Ding Tianhong was working with.
> 

According the suggestion, I could only think of this code:

@@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
                              quirk_tw686x_class);

+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+ if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
+     dev->vendor != PCI_VENDOR_ID_SUN)
+         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
+                       quirk_relaxedordering_disable);
+
 /*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the


What do you think of it?

Thanks
Ding


> - Alex
> 
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-06  3:08                   ` Ding Tianhong
  0 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-06  3:08 UTC (permalink / raw)
  To: linux-arm-kernel



On 2017/5/5 22:04, Alexander Duyck wrote:
> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>> | Sent: Wednesday, May 3, 2017 9:02 AM
>> | ...
>> | It sounds like we are more or less in agreement. My only concern is
>> | really what we default this to. On x86 I would say we could probably
>> | default this to disabled for existing platforms since my understanding
>> | is that relaxed ordering doesn't provide much benefit on what is out
>> | there right now when performing DMA through the root complex. As far
>> | as peer-to-peer I would say we should probably look at enabling the
>> | ability to have Relaxed Ordering enabled for some channels but not
>> | others. In those cases the hardware needs to be smart enough to allow
>> | for you to indicate you want it disabled by default for most of your
>> | DMA channels, and then enabled for the select channels that are
>> | handling the peer-to-peer traffic.
>>
>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>> sure that whatever scheme was developed would allow for simultaneously
>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>> Ordering for others within the same system.  I.e. not simply
>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>
>>   By the way, I've started our QA folks off looking at what things look like
>> in Linux Virtual Machines under different Hypervisors to see what
>> information they may provide to the VM in the way of what Root Complex Port
>> is being used, etc.  So far they've got Windows HyperV done and there
>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>> have to see what pci_find_pcie_root_port() returns in that environment.
>> Maybe NULL?
> 
> I believe NULL is one of the options. It all depends on what qemu is
> emulating. Most likely you won't find a pcie root port on KVM because
> the default is to emulate an older system that only supports PCI.
> 
>>   With your reservations (which I also share), I think that it probably
>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>> Ordering With TLPs Directed At This End Point" predicate, with the default
>> being "No" for any architecture which doesn't implement the predicate.  And
>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>> False for that as well.  I can't see any reason to pass in the Source End
>> Node but I may be missing something.
>>
>>   At this point, this is pretty far outside my level of expertise.  I'm
>> happy to give it a go, but I'd be even happier if someone with a lot more
>> experience in the PCIe Infrastructure were to want to carry the ball
>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>> Engagement", so let me know what my next step should be.  Thanks.
>>
>> Casey
> 
> For now we can probably keep this on the linux-pci mailing list. Going
> that route is the most straight forward for now since step one is
> probably just making sure we are setting the relaxed ordering bit in
> the setups that make sense. I would say we could probably keep it
> simple. We just need to enable relaxed ordering by default for SPARC
> architectures, on most others we can probably default it to off.
> 

Casey, Alexander:

Thanks for the wonderful discussion, it is more clearly that what to do next,
I agree that enable relaxed ordering by default only for SPARC and ARM64
is more safe for all the other platform, as no one want to break anything.

> I believe this all had started as Ding Tianhong was hoping to enable
> this for the ARM architecture. That is the only one I can think of
> where it might be difficult to figure out which way to default as we
> were attempting to follow the same code that was enabled for SPARC and
> that is what started this tug-of-war about how this should be done.
> What we might do is take care of this in two phases. The first one
> enables the infrastructure generically but leaves it defaulted to off
> for everyone but SPARC. Then we can go through and start enabling it
> for other platforms such as some of those on ARM in the platforms that
> Ding Tianhong was working with.
> 

According the suggestion, I could only think of this code:

@@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
                              quirk_tw686x_class);

+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+ if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
+     dev->vendor != PCI_VENDOR_ID_SUN)
+         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
+                       quirk_relaxedordering_disable);
+
 /*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the


What do you think of it?

Thanks
Ding


> - Alex
> 
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-06  3:08                   ` Ding Tianhong
@ 2017-05-06 18:07                     ` Alexander Duyck
  -1 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-06 18:07 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Casey Leedom, Raj, Ashok, Bjorn Helgaas, Michael Werner,
	Ganesh GR, Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit,
	Bob Shaw, h, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight,
	Jeff Kirsher, Netdev

On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
>
> On 2017/5/5 22:04, Alexander Duyck wrote:
>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>> | ...
>>> | It sounds like we are more or less in agreement. My only concern is
>>> | really what we default this to. On x86 I would say we could probably
>>> | default this to disabled for existing platforms since my understanding
>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>> | there right now when performing DMA through the root complex. As far
>>> | as peer-to-peer I would say we should probably look at enabling the
>>> | ability to have Relaxed Ordering enabled for some channels but not
>>> | others. In those cases the hardware needs to be smart enough to allow
>>> | for you to indicate you want it disabled by default for most of your
>>> | DMA channels, and then enabled for the select channels that are
>>> | handling the peer-to-peer traffic.
>>>
>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>> sure that whatever scheme was developed would allow for simultaneously
>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>> Ordering for others within the same system.  I.e. not simply
>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>
>>>   By the way, I've started our QA folks off looking at what things look like
>>> in Linux Virtual Machines under different Hypervisors to see what
>>> information they may provide to the VM in the way of what Root Complex Port
>>> is being used, etc.  So far they've got Windows HyperV done and there
>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>> Maybe NULL?
>>
>> I believe NULL is one of the options. It all depends on what qemu is
>> emulating. Most likely you won't find a pcie root port on KVM because
>> the default is to emulate an older system that only supports PCI.
>>
>>>   With your reservations (which I also share), I think that it probably
>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>> being "No" for any architecture which doesn't implement the predicate.  And
>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>> False for that as well.  I can't see any reason to pass in the Source End
>>> Node but I may be missing something.
>>>
>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>> experience in the PCIe Infrastructure were to want to carry the ball
>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>> Engagement", so let me know what my next step should be.  Thanks.
>>>
>>> Casey
>>
>> For now we can probably keep this on the linux-pci mailing list. Going
>> that route is the most straight forward for now since step one is
>> probably just making sure we are setting the relaxed ordering bit in
>> the setups that make sense. I would say we could probably keep it
>> simple. We just need to enable relaxed ordering by default for SPARC
>> architectures, on most others we can probably default it to off.
>>
>
> Casey, Alexander:
>
> Thanks for the wonderful discussion, it is more clearly that what to do next,
> I agree that enable relaxed ordering by default only for SPARC and ARM64
> is more safe for all the other platform, as no one want to break anything.
>
>> I believe this all had started as Ding Tianhong was hoping to enable
>> this for the ARM architecture. That is the only one I can think of
>> where it might be difficult to figure out which way to default as we
>> were attempting to follow the same code that was enabled for SPARC and
>> that is what started this tug-of-war about how this should be done.
>> What we might do is take care of this in two phases. The first one
>> enables the infrastructure generically but leaves it defaulted to off
>> for everyone but SPARC. Then we can go through and start enabling it
>> for other platforms such as some of those on ARM in the platforms that
>> Ding Tianhong was working with.
>>
>
> According the suggestion, I could only think of this code:
>
> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>                               quirk_tw686x_class);
>
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
> +     dev->vendor != PCI_VENDOR_ID_SUN)
> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
> +                       quirk_relaxedordering_disable);
> +
>  /*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>
>
> What do you think of it?
>
> Thanks
> Ding
>

This is a bit simplistic but it is a start.

The other bit I was getting at is that we need to update the core PCIe
code so that when we configure devices and the root complex reports no
support for relaxed ordering it should be clearing the relaxed
ordering bits in the PCIe configuration registers on the upstream
facing devices.

The last bit we need in all this is a way to allow for setups where
peer-to-peer wants to perform relaxed ordering but for writes to the
host we have to not use relaxed ordering. For that we need to enable a
special case and that isn't handled right now in any of the solutions
we have coded up so far.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-06 18:07                     ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-06 18:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
>
> On 2017/5/5 22:04, Alexander Duyck wrote:
>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>> | ...
>>> | It sounds like we are more or less in agreement. My only concern is
>>> | really what we default this to. On x86 I would say we could probably
>>> | default this to disabled for existing platforms since my understanding
>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>> | there right now when performing DMA through the root complex. As far
>>> | as peer-to-peer I would say we should probably look at enabling the
>>> | ability to have Relaxed Ordering enabled for some channels but not
>>> | others. In those cases the hardware needs to be smart enough to allow
>>> | for you to indicate you want it disabled by default for most of your
>>> | DMA channels, and then enabled for the select channels that are
>>> | handling the peer-to-peer traffic.
>>>
>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>> sure that whatever scheme was developed would allow for simultaneously
>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>> Ordering for others within the same system.  I.e. not simply
>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>
>>>   By the way, I've started our QA folks off looking at what things look like
>>> in Linux Virtual Machines under different Hypervisors to see what
>>> information they may provide to the VM in the way of what Root Complex Port
>>> is being used, etc.  So far they've got Windows HyperV done and there
>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>> Maybe NULL?
>>
>> I believe NULL is one of the options. It all depends on what qemu is
>> emulating. Most likely you won't find a pcie root port on KVM because
>> the default is to emulate an older system that only supports PCI.
>>
>>>   With your reservations (which I also share), I think that it probably
>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>> being "No" for any architecture which doesn't implement the predicate.  And
>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>> False for that as well.  I can't see any reason to pass in the Source End
>>> Node but I may be missing something.
>>>
>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>> experience in the PCIe Infrastructure were to want to carry the ball
>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>> Engagement", so let me know what my next step should be.  Thanks.
>>>
>>> Casey
>>
>> For now we can probably keep this on the linux-pci mailing list. Going
>> that route is the most straight forward for now since step one is
>> probably just making sure we are setting the relaxed ordering bit in
>> the setups that make sense. I would say we could probably keep it
>> simple. We just need to enable relaxed ordering by default for SPARC
>> architectures, on most others we can probably default it to off.
>>
>
> Casey, Alexander:
>
> Thanks for the wonderful discussion, it is more clearly that what to do next,
> I agree that enable relaxed ordering by default only for SPARC and ARM64
> is more safe for all the other platform, as no one want to break anything.
>
>> I believe this all had started as Ding Tianhong was hoping to enable
>> this for the ARM architecture. That is the only one I can think of
>> where it might be difficult to figure out which way to default as we
>> were attempting to follow the same code that was enabled for SPARC and
>> that is what started this tug-of-war about how this should be done.
>> What we might do is take care of this in two phases. The first one
>> enables the infrastructure generically but leaves it defaulted to off
>> for everyone but SPARC. Then we can go through and start enabling it
>> for other platforms such as some of those on ARM in the platforms that
>> Ding Tianhong was working with.
>>
>
> According the suggestion, I could only think of this code:
>
> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>                               quirk_tw686x_class);
>
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
> +     dev->vendor != PCI_VENDOR_ID_SUN)
> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
> +                       quirk_relaxedordering_disable);
> +
>  /*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>
>
> What do you think of it?
>
> Thanks
> Ding
>

This is a bit simplistic but it is a start.

The other bit I was getting at is that we need to update the core PCIe
code so that when we configure devices and the root complex reports no
support for relaxed ordering it should be clearing the relaxed
ordering bits in the PCIe configuration registers on the upstream
facing devices.

The last bit we need in all this is a way to allow for setups where
peer-to-peer wants to perform relaxed ordering but for writes to the
host we have to not use relaxed ordering. For that we need to enable a
special case and that isn't handled right now in any of the solutions
we have coded up so far.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-06 18:07                     ` Alexander Duyck
@ 2017-05-08 14:33                       ` Ding Tianhong
  -1 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-08 14:33 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Mark Rutland, Gabriele Paoloni, Asit K Mallick, Catalin Marinas,
	Will Deacon, LinuxArm, Raj, Ashok, Bjorn Helgaas, Ganesh GR,
	Jeff Kirsher, Bob Shaw, Casey Leedom, Patrick J Cramer, Arjun V.,
	Michael Werner, linux-arm-kernel, Amir Ancel, Netdev,
	David Laight, Suravee Suthikulpanit, Robin Murphy, David Miller,
	h



On 2017/5/7 2:07, Alexander Duyck wrote:
> On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>
>>
>> On 2017/5/5 22:04, Alexander Duyck wrote:
>>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>>> | ...
>>>> | It sounds like we are more or less in agreement. My only concern is
>>>> | really what we default this to. On x86 I would say we could probably
>>>> | default this to disabled for existing platforms since my understanding
>>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>>> | there right now when performing DMA through the root complex. As far
>>>> | as peer-to-peer I would say we should probably look at enabling the
>>>> | ability to have Relaxed Ordering enabled for some channels but not
>>>> | others. In those cases the hardware needs to be smart enough to allow
>>>> | for you to indicate you want it disabled by default for most of your
>>>> | DMA channels, and then enabled for the select channels that are
>>>> | handling the peer-to-peer traffic.
>>>>
>>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>>> sure that whatever scheme was developed would allow for simultaneously
>>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>>> Ordering for others within the same system.  I.e. not simply
>>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>>
>>>>   By the way, I've started our QA folks off looking at what things look like
>>>> in Linux Virtual Machines under different Hypervisors to see what
>>>> information they may provide to the VM in the way of what Root Complex Port
>>>> is being used, etc.  So far they've got Windows HyperV done and there
>>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>>> Maybe NULL?
>>>
>>> I believe NULL is one of the options. It all depends on what qemu is
>>> emulating. Most likely you won't find a pcie root port on KVM because
>>> the default is to emulate an older system that only supports PCI.
>>>
>>>>   With your reservations (which I also share), I think that it probably
>>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>>> being "No" for any architecture which doesn't implement the predicate.  And
>>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>>> False for that as well.  I can't see any reason to pass in the Source End
>>>> Node but I may be missing something.
>>>>
>>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>>> experience in the PCIe Infrastructure were to want to carry the ball
>>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>>> Engagement", so let me know what my next step should be.  Thanks.
>>>>
>>>> Casey
>>>
>>> For now we can probably keep this on the linux-pci mailing list. Going
>>> that route is the most straight forward for now since step one is
>>> probably just making sure we are setting the relaxed ordering bit in
>>> the setups that make sense. I would say we could probably keep it
>>> simple. We just need to enable relaxed ordering by default for SPARC
>>> architectures, on most others we can probably default it to off.
>>>
>>
>> Casey, Alexander:
>>
>> Thanks for the wonderful discussion, it is more clearly that what to do next,
>> I agree that enable relaxed ordering by default only for SPARC and ARM64
>> is more safe for all the other platform, as no one want to break anything.
>>
>>> I believe this all had started as Ding Tianhong was hoping to enable
>>> this for the ARM architecture. That is the only one I can think of
>>> where it might be difficult to figure out which way to default as we
>>> were attempting to follow the same code that was enabled for SPARC and
>>> that is what started this tug-of-war about how this should be done.
>>> What we might do is take care of this in two phases. The first one
>>> enables the infrastructure generically but leaves it defaulted to off
>>> for everyone but SPARC. Then we can go through and start enabling it
>>> for other platforms such as some of those on ARM in the platforms that
>>> Ding Tianhong was working with.
>>>
>>
>> According the suggestion, I could only think of this code:
>>
>> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>>                               quirk_tw686x_class);
>>
>> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
>> +{
>> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
>> +     dev->vendor != PCI_VENDOR_ID_SUN)
>> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>> +}
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
>> +                       quirk_relaxedordering_disable);
>> +
>>  /*
>>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>>   * values for the Attribute as were supplied in the header of the
>>
>>
>> What do you think of it?
>>
>> Thanks
>> Ding
>>
> 
> This is a bit simplistic but it is a start.
> 
> The other bit I was getting at is that we need to update the core PCIe
> code so that when we configure devices and the root complex reports no
> support for relaxed ordering it should be clearing the relaxed
> ordering bits in the PCIe configuration registers on the upstream
> facing devices.

How about this:
rename the PCI_DEV_FLAGS_NO_RELAXED_ORDERIN to PCI_DEV_FLAGS_RELAXED_ORDERIN, only enable it
when pcie root configure if it support the RO mode, otherwise we will not set it to indicate
that the pcie dev did not support RO mode.

> 
> The last bit we need in all this is a way to allow for setups where
> peer-to-peer wants to perform relaxed ordering but for writes to the
> host we have to not use relaxed ordering. For that we need to enable a
> special case and that isn't handled right now in any of the solutions
> we have coded up so far.
> 

Sorry I am not clear of this way, can you explain more about this or give me
a special case, thanks a lot.

Ding

> - Alex
> 
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-08 14:33                       ` Ding Tianhong
  0 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-08 14:33 UTC (permalink / raw)
  To: linux-arm-kernel



On 2017/5/7 2:07, Alexander Duyck wrote:
> On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>
>>
>> On 2017/5/5 22:04, Alexander Duyck wrote:
>>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>>> | ...
>>>> | It sounds like we are more or less in agreement. My only concern is
>>>> | really what we default this to. On x86 I would say we could probably
>>>> | default this to disabled for existing platforms since my understanding
>>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>>> | there right now when performing DMA through the root complex. As far
>>>> | as peer-to-peer I would say we should probably look at enabling the
>>>> | ability to have Relaxed Ordering enabled for some channels but not
>>>> | others. In those cases the hardware needs to be smart enough to allow
>>>> | for you to indicate you want it disabled by default for most of your
>>>> | DMA channels, and then enabled for the select channels that are
>>>> | handling the peer-to-peer traffic.
>>>>
>>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>>> sure that whatever scheme was developed would allow for simultaneously
>>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>>> Ordering for others within the same system.  I.e. not simply
>>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>>
>>>>   By the way, I've started our QA folks off looking at what things look like
>>>> in Linux Virtual Machines under different Hypervisors to see what
>>>> information they may provide to the VM in the way of what Root Complex Port
>>>> is being used, etc.  So far they've got Windows HyperV done and there
>>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>>> Maybe NULL?
>>>
>>> I believe NULL is one of the options. It all depends on what qemu is
>>> emulating. Most likely you won't find a pcie root port on KVM because
>>> the default is to emulate an older system that only supports PCI.
>>>
>>>>   With your reservations (which I also share), I think that it probably
>>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>>> being "No" for any architecture which doesn't implement the predicate.  And
>>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>>> False for that as well.  I can't see any reason to pass in the Source End
>>>> Node but I may be missing something.
>>>>
>>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>>> experience in the PCIe Infrastructure were to want to carry the ball
>>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>>> Engagement", so let me know what my next step should be.  Thanks.
>>>>
>>>> Casey
>>>
>>> For now we can probably keep this on the linux-pci mailing list. Going
>>> that route is the most straight forward for now since step one is
>>> probably just making sure we are setting the relaxed ordering bit in
>>> the setups that make sense. I would say we could probably keep it
>>> simple. We just need to enable relaxed ordering by default for SPARC
>>> architectures, on most others we can probably default it to off.
>>>
>>
>> Casey, Alexander:
>>
>> Thanks for the wonderful discussion, it is more clearly that what to do next,
>> I agree that enable relaxed ordering by default only for SPARC and ARM64
>> is more safe for all the other platform, as no one want to break anything.
>>
>>> I believe this all had started as Ding Tianhong was hoping to enable
>>> this for the ARM architecture. That is the only one I can think of
>>> where it might be difficult to figure out which way to default as we
>>> were attempting to follow the same code that was enabled for SPARC and
>>> that is what started this tug-of-war about how this should be done.
>>> What we might do is take care of this in two phases. The first one
>>> enables the infrastructure generically but leaves it defaulted to off
>>> for everyone but SPARC. Then we can go through and start enabling it
>>> for other platforms such as some of those on ARM in the platforms that
>>> Ding Tianhong was working with.
>>>
>>
>> According the suggestion, I could only think of this code:
>>
>> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>>                               quirk_tw686x_class);
>>
>> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
>> +{
>> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
>> +     dev->vendor != PCI_VENDOR_ID_SUN)
>> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>> +}
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
>> +                       quirk_relaxedordering_disable);
>> +
>>  /*
>>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>>   * values for the Attribute as were supplied in the header of the
>>
>>
>> What do you think of it?
>>
>> Thanks
>> Ding
>>
> 
> This is a bit simplistic but it is a start.
> 
> The other bit I was getting at is that we need to update the core PCIe
> code so that when we configure devices and the root complex reports no
> support for relaxed ordering it should be clearing the relaxed
> ordering bits in the PCIe configuration registers on the upstream
> facing devices.

How about this:
rename the PCI_DEV_FLAGS_NO_RELAXED_ORDERIN to PCI_DEV_FLAGS_RELAXED_ORDERIN, only enable it
when pcie root configure if it support the RO mode, otherwise we will not set it to indicate
that the pcie dev did not support RO mode.

> 
> The last bit we need in all this is a way to allow for setups where
> peer-to-peer wants to perform relaxed ordering but for writes to the
> host we have to not use relaxed ordering. For that we need to enable a
> special case and that isn't handled right now in any of the solutions
> we have coded up so far.
> 

Sorry I am not clear of this way, can you explain more about this or give me
a special case, thanks a lot.

Ding

> - Alex
> 
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-08 14:33                       ` Ding Tianhong
@ 2017-05-08 15:22                         ` Alexander Duyck
  -1 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-08 15:22 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Casey Leedom, Raj, Ashok, Bjorn Helgaas, Michael Werner,
	Ganesh GR, Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit,
	Bob Shaw, h, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight,
	Jeff Kirsher, Netdev

On Mon, May 8, 2017 at 7:33 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
>
> On 2017/5/7 2:07, Alexander Duyck wrote:
>> On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>>
>>>
>>> On 2017/5/5 22:04, Alexander Duyck wrote:
>>>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>>>> | ...
>>>>> | It sounds like we are more or less in agreement. My only concern is
>>>>> | really what we default this to. On x86 I would say we could probably
>>>>> | default this to disabled for existing platforms since my understanding
>>>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>>>> | there right now when performing DMA through the root complex. As far
>>>>> | as peer-to-peer I would say we should probably look at enabling the
>>>>> | ability to have Relaxed Ordering enabled for some channels but not
>>>>> | others. In those cases the hardware needs to be smart enough to allow
>>>>> | for you to indicate you want it disabled by default for most of your
>>>>> | DMA channels, and then enabled for the select channels that are
>>>>> | handling the peer-to-peer traffic.
>>>>>
>>>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>>>> sure that whatever scheme was developed would allow for simultaneously
>>>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>>>> Ordering for others within the same system.  I.e. not simply
>>>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>>>
>>>>>   By the way, I've started our QA folks off looking at what things look like
>>>>> in Linux Virtual Machines under different Hypervisors to see what
>>>>> information they may provide to the VM in the way of what Root Complex Port
>>>>> is being used, etc.  So far they've got Windows HyperV done and there
>>>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>>>> Maybe NULL?
>>>>
>>>> I believe NULL is one of the options. It all depends on what qemu is
>>>> emulating. Most likely you won't find a pcie root port on KVM because
>>>> the default is to emulate an older system that only supports PCI.
>>>>
>>>>>   With your reservations (which I also share), I think that it probably
>>>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>>>> being "No" for any architecture which doesn't implement the predicate.  And
>>>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>>>> False for that as well.  I can't see any reason to pass in the Source End
>>>>> Node but I may be missing something.
>>>>>
>>>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>>>> experience in the PCIe Infrastructure were to want to carry the ball
>>>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>>>> Engagement", so let me know what my next step should be.  Thanks.
>>>>>
>>>>> Casey
>>>>
>>>> For now we can probably keep this on the linux-pci mailing list. Going
>>>> that route is the most straight forward for now since step one is
>>>> probably just making sure we are setting the relaxed ordering bit in
>>>> the setups that make sense. I would say we could probably keep it
>>>> simple. We just need to enable relaxed ordering by default for SPARC
>>>> architectures, on most others we can probably default it to off.
>>>>
>>>
>>> Casey, Alexander:
>>>
>>> Thanks for the wonderful discussion, it is more clearly that what to do next,
>>> I agree that enable relaxed ordering by default only for SPARC and ARM64
>>> is more safe for all the other platform, as no one want to break anything.
>>>
>>>> I believe this all had started as Ding Tianhong was hoping to enable
>>>> this for the ARM architecture. That is the only one I can think of
>>>> where it might be difficult to figure out which way to default as we
>>>> were attempting to follow the same code that was enabled for SPARC and
>>>> that is what started this tug-of-war about how this should be done.
>>>> What we might do is take care of this in two phases. The first one
>>>> enables the infrastructure generically but leaves it defaulted to off
>>>> for everyone but SPARC. Then we can go through and start enabling it
>>>> for other platforms such as some of those on ARM in the platforms that
>>>> Ding Tianhong was working with.
>>>>
>>>
>>> According the suggestion, I could only think of this code:
>>>
>>> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>>>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>>>                               quirk_tw686x_class);
>>>
>>> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
>>> +{
>>> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
>>> +     dev->vendor != PCI_VENDOR_ID_SUN)
>>> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>>> +}
>>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
>>> +                       quirk_relaxedordering_disable);
>>> +
>>>  /*
>>>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>>>   * values for the Attribute as were supplied in the header of the
>>>
>>>
>>> What do you think of it?
>>>
>>> Thanks
>>> Ding
>>>
>>
>> This is a bit simplistic but it is a start.
>>
>> The other bit I was getting at is that we need to update the core PCIe
>> code so that when we configure devices and the root complex reports no
>> support for relaxed ordering it should be clearing the relaxed
>> ordering bits in the PCIe configuration registers on the upstream
>> facing devices.
>
> How about this:
> rename the PCI_DEV_FLAGS_NO_RELAXED_ORDERIN to PCI_DEV_FLAGS_RELAXED_ORDERIN, only enable it
> when pcie root configure if it support the RO mode, otherwise we will not set it to indicate
> that the pcie dev did not support RO mode.

The problem is we need to have something that can be communicated
through a VM. Your change doesn't work in that regard. That was why I
suggested just updating the code so that we when we initialized PCIe
devices what we do is either set or clear the relaxed ordering bit in
the PCIe device control register. That way when we direct assign an
interface it could know just based on the bits int the PCIe
configuration if it could use relaxed ordering or not.

At that point the driver code itself becomes very simple since you
could just enable the relaxed ordering by default in the igb/ixgbe
driver and if the bit is set or cleared in the PCIe configuration then
we are either sending with relaxed ordering requests or not and don't
have to try and locate the root complex.

>>
>> The last bit we need in all this is a way to allow for setups where
>> peer-to-peer wants to perform relaxed ordering but for writes to the
>> host we have to not use relaxed ordering. For that we need to enable a
>> special case and that isn't handled right now in any of the solutions
>> we have coded up so far.
>>
>
> Sorry I am not clear of this way, can you explain more about this or give me
> a special case, thanks a lot.

So from the sound of it Casey has a special use case where he doesn't
want to send relaxed ordering frames to the root complex, but instead
would like to send them to another PCIe device. To do that he needs to
have a way to enable the relaxed ordering bit in the PCIe
configuration but then not send any to the root complex. Odds are that
is something he might be able to just implement in the driver, but is
something that may become a more general case in the future. I don't
see our change here impacting it as long as we keep the solution
generic and mostly confined to when we instantiate the devices as the
driver could likely make the decision to change the behavior later.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-08 15:22                         ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-08 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 8, 2017 at 7:33 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
>
> On 2017/5/7 2:07, Alexander Duyck wrote:
>> On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>>
>>>
>>> On 2017/5/5 22:04, Alexander Duyck wrote:
>>>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <leedom@chelsio.com> wrote:
>>>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>>>> | ...
>>>>> | It sounds like we are more or less in agreement. My only concern is
>>>>> | really what we default this to. On x86 I would say we could probably
>>>>> | default this to disabled for existing platforms since my understanding
>>>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>>>> | there right now when performing DMA through the root complex. As far
>>>>> | as peer-to-peer I would say we should probably look at enabling the
>>>>> | ability to have Relaxed Ordering enabled for some channels but not
>>>>> | others. In those cases the hardware needs to be smart enough to allow
>>>>> | for you to indicate you want it disabled by default for most of your
>>>>> | DMA channels, and then enabled for the select channels that are
>>>>> | handling the peer-to-peer traffic.
>>>>>
>>>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>>>> sure that whatever scheme was developed would allow for simultaneously
>>>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>>>> Ordering for others within the same system.  I.e. not simply
>>>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>>>
>>>>>   By the way, I've started our QA folks off looking at what things look like
>>>>> in Linux Virtual Machines under different Hypervisors to see what
>>>>> information they may provide to the VM in the way of what Root Complex Port
>>>>> is being used, etc.  So far they've got Windows HyperV done and there
>>>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>>>> Maybe NULL?
>>>>
>>>> I believe NULL is one of the options. It all depends on what qemu is
>>>> emulating. Most likely you won't find a pcie root port on KVM because
>>>> the default is to emulate an older system that only supports PCI.
>>>>
>>>>>   With your reservations (which I also share), I think that it probably
>>>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>>>> being "No" for any architecture which doesn't implement the predicate.  And
>>>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>>>> False for that as well.  I can't see any reason to pass in the Source End
>>>>> Node but I may be missing something.
>>>>>
>>>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>>>> experience in the PCIe Infrastructure were to want to carry the ball
>>>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>>>> Engagement", so let me know what my next step should be.  Thanks.
>>>>>
>>>>> Casey
>>>>
>>>> For now we can probably keep this on the linux-pci mailing list. Going
>>>> that route is the most straight forward for now since step one is
>>>> probably just making sure we are setting the relaxed ordering bit in
>>>> the setups that make sense. I would say we could probably keep it
>>>> simple. We just need to enable relaxed ordering by default for SPARC
>>>> architectures, on most others we can probably default it to off.
>>>>
>>>
>>> Casey, Alexander:
>>>
>>> Thanks for the wonderful discussion, it is more clearly that what to do next,
>>> I agree that enable relaxed ordering by default only for SPARC and ARM64
>>> is more safe for all the other platform, as no one want to break anything.
>>>
>>>> I believe this all had started as Ding Tianhong was hoping to enable
>>>> this for the ARM architecture. That is the only one I can think of
>>>> where it might be difficult to figure out which way to default as we
>>>> were attempting to follow the same code that was enabled for SPARC and
>>>> that is what started this tug-of-war about how this should be done.
>>>> What we might do is take care of this in two phases. The first one
>>>> enables the infrastructure generically but leaves it defaulted to off
>>>> for everyone but SPARC. Then we can go through and start enabling it
>>>> for other platforms such as some of those on ARM in the platforms that
>>>> Ding Tianhong was working with.
>>>>
>>>
>>> According the suggestion, I could only think of this code:
>>>
>>> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>>>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>>>                               quirk_tw686x_class);
>>>
>>> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
>>> +{
>>> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
>>> +     dev->vendor != PCI_VENDOR_ID_SUN)
>>> +         dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>>> +}
>>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 8,
>>> +                       quirk_relaxedordering_disable);
>>> +
>>>  /*
>>>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>>>   * values for the Attribute as were supplied in the header of the
>>>
>>>
>>> What do you think of it?
>>>
>>> Thanks
>>> Ding
>>>
>>
>> This is a bit simplistic but it is a start.
>>
>> The other bit I was getting at is that we need to update the core PCIe
>> code so that when we configure devices and the root complex reports no
>> support for relaxed ordering it should be clearing the relaxed
>> ordering bits in the PCIe configuration registers on the upstream
>> facing devices.
>
> How about this:
> rename the PCI_DEV_FLAGS_NO_RELAXED_ORDERIN to PCI_DEV_FLAGS_RELAXED_ORDERIN, only enable it
> when pcie root configure if it support the RO mode, otherwise we will not set it to indicate
> that the pcie dev did not support RO mode.

The problem is we need to have something that can be communicated
through a VM. Your change doesn't work in that regard. That was why I
suggested just updating the code so that we when we initialized PCIe
devices what we do is either set or clear the relaxed ordering bit in
the PCIe device control register. That way when we direct assign an
interface it could know just based on the bits int the PCIe
configuration if it could use relaxed ordering or not.

At that point the driver code itself becomes very simple since you
could just enable the relaxed ordering by default in the igb/ixgbe
driver and if the bit is set or cleared in the PCIe configuration then
we are either sending with relaxed ordering requests or not and don't
have to try and locate the root complex.

>>
>> The last bit we need in all this is a way to allow for setups where
>> peer-to-peer wants to perform relaxed ordering but for writes to the
>> host we have to not use relaxed ordering. For that we need to enable a
>> special case and that isn't handled right now in any of the solutions
>> we have coded up so far.
>>
>
> Sorry I am not clear of this way, can you explain more about this or give me
> a special case, thanks a lot.

So from the sound of it Casey has a special use case where he doesn't
want to send relaxed ordering frames to the root complex, but instead
would like to send them to another PCIe device. To do that he needs to
have a way to enable the relaxed ordering bit in the PCIe
configuration but then not send any to the root complex. Odds are that
is something he might be able to just implement in the driver, but is
something that may become a more general case in the future. I don't
see our change here impacting it as long as we keep the solution
generic and mostly confined to when we instantiate the devices as the
driver could likely make the decision to change the behavior later.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-06 18:07                     ` Alexander Duyck
@ 2017-05-09  0:48                       ` Casey Leedom
  -1 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-09  0:48 UTC (permalink / raw)
  To: Alexander Duyck, Ding Tianhong
  Cc: Raj, Ashok, Bjorn Helgaas, Michael Werner, Ganesh GR, Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit,
	Bob Shaw, h, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight,
	Jeff Kirsher, Netdev, Robin Murphy


| From: Alexander Duyck <alexander.duyck@gmail.com>
| Date: Saturday, May 6, 2017 11:07 AM
| 
| | From: Ding Tianhong <dingtianhong@huawei.com>
| | Date: Fri, May 5, 2017 at 8:08 PM
| | 
| | According the suggestion, I could only think of this code:
| | ..
| 
| This is a bit simplistic but it is a start.

  Yes, something tells me that this is going to be more complicated than any
of us like ...

| The other bit I was getting at is that we need to update the core PCIe
| code so that when we configure devices and the root complex reports no
| support for relaxed ordering it should be clearing the relaxed
| ordering bits in the PCIe configuration registers on the upstream
| facing devices.

  Of course, this can be written to by the Driver at any time ... and is in
the case of the cxgb4 Driver ...

  After a lot of rummaging around, it does look like KVM prohibits writes to
the PCIe Capability Device Control register in drivers/xen/xen-pciback/
conf_space_capability.c and conf_space.c simply because writes aren't
allowed unless "permissive" is set.  So it ~looks~ like a driver running in
a Virtual Machine can't turn Enable Relaxed Ordering back on ...

| The last bit we need in all this is a way to allow for setups where
| peer-to-peer wants to perform relaxed ordering but for writes to the
| host we have to not use relaxed ordering. For that we need to enable a
| special case and that isn't handled right now in any of the solutions
| we have coded up so far.

  Yes, we do need this.


| From: Alexander Duyck <alexander.duyck@gmail.com>
| Date: Saturday, May 8, 2017 08:22 AM
|
| The problem is we need to have something that can be communicated
| through a VM. Your change doesn't work in that regard. That was why I
| suggested just updating the code so that we when we initialized PCIe
| devices what we do is either set or clear the relaxed ordering bit in
| the PCIe device control register. That way when we direct assign an
| interface it could know just based on the bits int the PCIe
| configuration if it could use relaxed ordering or not.
| 
| At that point the driver code itself becomes very simple since you
| could just enable the relaxed ordering by default in the igb/ixgbe
| driver and if the bit is set or cleared in the PCIe configuration then
| we are either sending with relaxed ordering requests or not and don't
| have to try and locate the root complex.
| 
| So from the sound of it Casey has a special use case where he doesn't
| want to send relaxed ordering frames to the root complex, but instead
| would like to send them to another PCIe device. To do that he needs to
| have a way to enable the relaxed ordering bit in the PCIe
| configuration but then not send any to the root complex. Odds are that
| is something he might be able to just implement in the driver, but is
| something that may become a more general case in the future. I don't
| see our change here impacting it as long as we keep the solution
| generic and mostly confined to when we instantiate the devices as the
| driver could likely make the decision to change the behavior later.

  It's not just me.  Intel has said that while RO directed at the Root
Complex Host Coherent Memory has a performance bug (not Data Corruption),
it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
interested in hearing what the bug is if we get that much detail.  The very
same TLPs directed to the Root Complex Port without Relaxed Ordering set get
good performance.  So this is essentially a bug in the hardware that was
~trying~ to implement a performance win.)

  Meanwhile, I currently only know of a single PCIe End Point which causes
catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
clear that product is even alive anymore since I haven't been able to get
any responses from them for several months.

  What I'm saying is: let's try to architect a solution which doesn't throw
the baby out with the bath water ...

  I think that if a Device's Root Complex Port has problems with Relaxed
Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
Control[Enable Relaxed Ordering] when we assign a device to a Virtual
Machine since the Device Driver can no longer query the Relaxed Ordering
Support of the Root Complex Port.  The only down side of this would be if we
assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
transfers.  But that seems like a hard application to support in any case
since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
match the actual values.

  For Devices running in the base OS/Hypervisor, their Drivers can query the
Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
simple flag within the (struct pci_dev *)->dev_flags would serve for that
along with a per-Architecture/Platform mechanism for setting it ...

Casey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-09  0:48                       ` Casey Leedom
  0 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-09  0:48 UTC (permalink / raw)
  To: linux-arm-kernel


| From: Alexander Duyck <alexander.duyck@gmail.com>
| Date: Saturday, May 6, 2017 11:07 AM
| 
| | From: Ding Tianhong <dingtianhong@huawei.com>
| | Date: Fri, May 5, 2017 at 8:08 PM
| | 
| | According the suggestion, I could only think of this code:
| | ..
| 
| This is a bit simplistic but it is a start.

  Yes, something tells me that this is going to be more complicated than any
of us like ...

| The other bit I was getting at is that we need to update the core PCIe
| code so that when we configure devices and the root complex reports no
| support for relaxed ordering it should be clearing the relaxed
| ordering bits in the PCIe configuration registers on the upstream
| facing devices.

  Of course, this can be written to by the Driver at any time ... and is in
the case of the cxgb4 Driver ...

  After a lot of rummaging around, it does look like KVM prohibits writes to
the PCIe Capability Device Control register in drivers/xen/xen-pciback/
conf_space_capability.c and conf_space.c simply because writes aren't
allowed unless "permissive" is set.  So it ~looks~ like a driver running in
a Virtual Machine can't turn Enable Relaxed Ordering back on ...

| The last bit we need in all this is a way to allow for setups where
| peer-to-peer wants to perform relaxed ordering but for writes to the
| host we have to not use relaxed ordering. For that we need to enable a
| special case and that isn't handled right now in any of the solutions
| we have coded up so far.

  Yes, we do need this.


| From: Alexander Duyck <alexander.duyck@gmail.com>
| Date: Saturday, May 8, 2017 08:22 AM
|
| The problem is we need to have something that can be communicated
| through a VM. Your change doesn't work in that regard. That was why I
| suggested just updating the code so that we when we initialized PCIe
| devices what we do is either set or clear the relaxed ordering bit in
| the PCIe device control register. That way when we direct assign an
| interface it could know just based on the bits int the PCIe
| configuration if it could use relaxed ordering or not.
| 
| At that point the driver code itself becomes very simple since you
| could just enable the relaxed ordering by default in the igb/ixgbe
| driver and if the bit is set or cleared in the PCIe configuration then
| we are either sending with relaxed ordering requests or not and don't
| have to try and locate the root complex.
| 
| So from the sound of it Casey has a special use case where he doesn't
| want to send relaxed ordering frames to the root complex, but instead
| would like to send them to another PCIe device. To do that he needs to
| have a way to enable the relaxed ordering bit in the PCIe
| configuration but then not send any to the root complex. Odds are that
| is something he might be able to just implement in the driver, but is
| something that may become a more general case in the future. I don't
| see our change here impacting it as long as we keep the solution
| generic and mostly confined to when we instantiate the devices as the
| driver could likely make the decision to change the behavior later.

  It's not just me.  Intel has said that while RO directed at the Root
Complex Host Coherent Memory has a performance bug (not Data Corruption),
it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
interested in hearing what the bug is if we get that much detail.  The very
same TLPs directed to the Root Complex Port without Relaxed Ordering set get
good performance.  So this is essentially a bug in the hardware that was
~trying~ to implement a performance win.)

  Meanwhile, I currently only know of a single PCIe End Point which causes
catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
clear that product is even alive anymore since I haven't been able to get
any responses from them for several months.

  What I'm saying is: let's try to architect a solution which doesn't throw
the baby out with the bath water ...

  I think that if a Device's Root Complex Port has problems with Relaxed
Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
Control[Enable Relaxed Ordering] when we assign a device to a Virtual
Machine since the Device Driver can no longer query the Relaxed Ordering
Support of the Root Complex Port.  The only down side of this would be if we
assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
transfers.  But that seems like a hard application to support in any case
since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
match the actual values.

  For Devices running in the base OS/Hypervisor, their Drivers can query the
Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
simple flag within the (struct pci_dev *)->dev_flags would serve for that
along with a per-Architecture/Platform mechanism for setting it ...

Casey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-09  0:48                       ` Casey Leedom
@ 2017-05-11  1:15                         ` Ding Tianhong
  -1 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-11  1:15 UTC (permalink / raw)
  To: Casey Leedom, Alexander Duyck
  Cc: Mark Rutland, Gabriele Paoloni, Asit K Mallick, Catalin Marinas,
	Will Deacon, LinuxArm, Raj, Ashok, Bjorn Helgaas, Ganesh GR,
	Jeff Kirsher, Bob Shaw, Patrick J Cramer, Arjun V.,
	Michael Werner, linux-arm-kernel, Amir Ancel, Netdev,
	David Laight, Suravee Suthikulpanit, Robin Murphy, David Miller,
	h



On 2017/5/9 8:48, Casey Leedom wrote:
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 6, 2017 11:07 AM
> | 
> | | From: Ding Tianhong <dingtianhong@huawei.com>
> | | Date: Fri, May 5, 2017 at 8:08 PM
> | | 
> | | According the suggestion, I could only think of this code:
> | | ..
> | 
> | This is a bit simplistic but it is a start.
> 
>   Yes, something tells me that this is going to be more complicated than any
> of us like ...
> 
> | The other bit I was getting at is that we need to update the core PCIe
> | code so that when we configure devices and the root complex reports no
> | support for relaxed ordering it should be clearing the relaxed
> | ordering bits in the PCIe configuration registers on the upstream
> | facing devices.
> 
>   Of course, this can be written to by the Driver at any time ... and is in
> the case of the cxgb4 Driver ...
> 
>   After a lot of rummaging around, it does look like KVM prohibits writes to
> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
> conf_space_capability.c and conf_space.c simply because writes aren't
> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
> 
> | The last bit we need in all this is a way to allow for setups where
> | peer-to-peer wants to perform relaxed ordering but for writes to the
> | host we have to not use relaxed ordering. For that we need to enable a
> | special case and that isn't handled right now in any of the solutions
> | we have coded up so far.
> 
>   Yes, we do need this.
> 
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 8, 2017 08:22 AM
> |
> | The problem is we need to have something that can be communicated
> | through a VM. Your change doesn't work in that regard. That was why I
> | suggested just updating the code so that we when we initialized PCIe
> | devices what we do is either set or clear the relaxed ordering bit in
> | the PCIe device control register. That way when we direct assign an
> | interface it could know just based on the bits int the PCIe
> | configuration if it could use relaxed ordering or not.
> | 
> | At that point the driver code itself becomes very simple since you
> | could just enable the relaxed ordering by default in the igb/ixgbe
> | driver and if the bit is set or cleared in the PCIe configuration then
> | we are either sending with relaxed ordering requests or not and don't
> | have to try and locate the root complex.
> | 
> | So from the sound of it Casey has a special use case where he doesn't
> | want to send relaxed ordering frames to the root complex, but instead
> | would like to send them to another PCIe device. To do that he needs to
> | have a way to enable the relaxed ordering bit in the PCIe
> | configuration but then not send any to the root complex. Odds are that
> | is something he might be able to just implement in the driver, but is
> | something that may become a more general case in the future. I don't
> | see our change here impacting it as long as we keep the solution
> | generic and mostly confined to when we instantiate the devices as the
> | driver could likely make the decision to change the behavior later.
> 
>   It's not just me.  Intel has said that while RO directed at the Root
> Complex Host Coherent Memory has a performance bug (not Data Corruption),
> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
> interested in hearing what the bug is if we get that much detail.  The very
> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
> good performance.  So this is essentially a bug in the hardware that was
> ~trying~ to implement a performance win.)
> 
>   Meanwhile, I currently only know of a single PCIe End Point which causes
> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
> clear that product is even alive anymore since I haven't been able to get
> any responses from them for several months.
> 
>   What I'm saying is: let's try to architect a solution which doesn't throw
> the baby out with the bath water ...
> 
>   I think that if a Device's Root Complex Port has problems with Relaxed
> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
> Machine since the Device Driver can no longer query the Relaxed Ordering
> Support of the Root Complex Port.  The only down side of this would be if we
> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
> transfers.  But that seems like a hard application to support in any case
> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
> match the actual values.
> 
>   For Devices running in the base OS/Hypervisor, their Drivers can query the
> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
> simple flag within the (struct pci_dev *)->dev_flags would serve for that
> along with a per-Architecture/Platform mechanism for setting it ...
> 

Hi Casey:

Will you continue to work on this solution or send a new version patches?

Thanks
Ding
> Casey
> 
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-11  1:15                         ` Ding Tianhong
  0 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-11  1:15 UTC (permalink / raw)
  To: linux-arm-kernel



On 2017/5/9 8:48, Casey Leedom wrote:
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 6, 2017 11:07 AM
> | 
> | | From: Ding Tianhong <dingtianhong@huawei.com>
> | | Date: Fri, May 5, 2017 at 8:08 PM
> | | 
> | | According the suggestion, I could only think of this code:
> | | ..
> | 
> | This is a bit simplistic but it is a start.
> 
>   Yes, something tells me that this is going to be more complicated than any
> of us like ...
> 
> | The other bit I was getting at is that we need to update the core PCIe
> | code so that when we configure devices and the root complex reports no
> | support for relaxed ordering it should be clearing the relaxed
> | ordering bits in the PCIe configuration registers on the upstream
> | facing devices.
> 
>   Of course, this can be written to by the Driver at any time ... and is in
> the case of the cxgb4 Driver ...
> 
>   After a lot of rummaging around, it does look like KVM prohibits writes to
> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
> conf_space_capability.c and conf_space.c simply because writes aren't
> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
> 
> | The last bit we need in all this is a way to allow for setups where
> | peer-to-peer wants to perform relaxed ordering but for writes to the
> | host we have to not use relaxed ordering. For that we need to enable a
> | special case and that isn't handled right now in any of the solutions
> | we have coded up so far.
> 
>   Yes, we do need this.
> 
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 8, 2017 08:22 AM
> |
> | The problem is we need to have something that can be communicated
> | through a VM. Your change doesn't work in that regard. That was why I
> | suggested just updating the code so that we when we initialized PCIe
> | devices what we do is either set or clear the relaxed ordering bit in
> | the PCIe device control register. That way when we direct assign an
> | interface it could know just based on the bits int the PCIe
> | configuration if it could use relaxed ordering or not.
> | 
> | At that point the driver code itself becomes very simple since you
> | could just enable the relaxed ordering by default in the igb/ixgbe
> | driver and if the bit is set or cleared in the PCIe configuration then
> | we are either sending with relaxed ordering requests or not and don't
> | have to try and locate the root complex.
> | 
> | So from the sound of it Casey has a special use case where he doesn't
> | want to send relaxed ordering frames to the root complex, but instead
> | would like to send them to another PCIe device. To do that he needs to
> | have a way to enable the relaxed ordering bit in the PCIe
> | configuration but then not send any to the root complex. Odds are that
> | is something he might be able to just implement in the driver, but is
> | something that may become a more general case in the future. I don't
> | see our change here impacting it as long as we keep the solution
> | generic and mostly confined to when we instantiate the devices as the
> | driver could likely make the decision to change the behavior later.
> 
>   It's not just me.  Intel has said that while RO directed at the Root
> Complex Host Coherent Memory has a performance bug (not Data Corruption),
> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
> interested in hearing what the bug is if we get that much detail.  The very
> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
> good performance.  So this is essentially a bug in the hardware that was
> ~trying~ to implement a performance win.)
> 
>   Meanwhile, I currently only know of a single PCIe End Point which causes
> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
> clear that product is even alive anymore since I haven't been able to get
> any responses from them for several months.
> 
>   What I'm saying is: let's try to architect a solution which doesn't throw
> the baby out with the bath water ...
> 
>   I think that if a Device's Root Complex Port has problems with Relaxed
> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
> Machine since the Device Driver can no longer query the Relaxed Ordering
> Support of the Root Complex Port.  The only down side of this would be if we
> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
> transfers.  But that seems like a hard application to support in any case
> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
> match the actual values.
> 
>   For Devices running in the base OS/Hypervisor, their Drivers can query the
> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
> simple flag within the (struct pci_dev *)->dev_flags would serve for that
> along with a per-Architecture/Platform mechanism for setting it ...
> 

Hi Casey:

Will you continue to work on this solution or send a new version patches?

Thanks
Ding
> Casey
> 
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-11  1:15                         ` Ding Tianhong
@ 2017-05-16 18:38                           ` Casey Leedom
  -1 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-16 18:38 UTC (permalink / raw)
  To: Ding Tianhong, Alexander Duyck
  Cc: Raj, Ashok, Bjorn Helgaas, Michael Werner, Ganesh GR, Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit,
	Bob Shaw, h, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight,
	Jeff Kirsher, Netdev, Robin Murphy

| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, May 10, 2017 6:15 PM
|
| Hi Casey:
|
| Will you continue to work on this solution or send a new version patches?

I won't be able to work on this any time soon given several other urgent
issues.  If you've got a desire to pick this up, I'd be happy to help code
review your efforts.

Casey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-16 18:38                           ` Casey Leedom
  0 siblings, 0 replies; 50+ messages in thread
From: Casey Leedom @ 2017-05-16 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

| From: Ding Tianhong <dingtianhong@huawei.com>
| Sent: Wednesday, May 10, 2017 6:15 PM
|
| Hi Casey:
|
| Will you continue to work on this solution or send a new version patches?

I won't be able to work on this any time soon given several other urgent
issues.  If you've got a desire to pick this up, I'd be happy to help code
review your efforts.

Casey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-16 18:38                           ` Casey Leedom
@ 2017-05-17 14:49                             ` Alexander Duyck
  -1 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-17 14:49 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Ding Tianhong, Raj, Ashok, Bjorn Helgaas, Michael Werner,
	Ganesh GR, Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit,
	Bob Shaw, h, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight,
	Jeff Kirsher, Netdev

On Tue, May 16, 2017 at 11:38 AM, Casey Leedom <leedom@chelsio.com> wrote:
> | From: Ding Tianhong <dingtianhong@huawei.com>
> | Sent: Wednesday, May 10, 2017 6:15 PM
> |
> | Hi Casey:
> |
> | Will you continue to work on this solution or send a new version patches?
>
> I won't be able to work on this any time soon given several other urgent
> issues.  If you've got a desire to pick this up, I'd be happy to help code
> review your efforts.
>
> Casey

Thanks for the heads up. I'll see if I can find somebody in my team
that might be able to take on the task though I can't make any
promises either as we have quite a bit going on.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-17 14:49                             ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-17 14:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 16, 2017 at 11:38 AM, Casey Leedom <leedom@chelsio.com> wrote:
> | From: Ding Tianhong <dingtianhong@huawei.com>
> | Sent: Wednesday, May 10, 2017 6:15 PM
> |
> | Hi Casey:
> |
> | Will you continue to work on this solution or send a new version patches?
>
> I won't be able to work on this any time soon given several other urgent
> issues.  If you've got a desire to pick this up, I'd be happy to help code
> review your efforts.
>
> Casey

Thanks for the heads up. I'll see if I can find somebody in my team
that might be able to take on the task though I can't make any
promises either as we have quite a bit going on.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-09  0:48                       ` Casey Leedom
@ 2017-05-25 13:35                         ` Ding Tianhong
  -1 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-25 13:35 UTC (permalink / raw)
  To: Casey Leedom, Alexander Duyck
  Cc: Mark Rutland, Gabriele Paoloni, Asit K Mallick, Catalin Marinas,
	Will Deacon, LinuxArm, Raj, Ashok, Bjorn Helgaas, Ganesh GR,
	Jeff Kirsher, Bob Shaw, Patrick J Cramer, Arjun V.,
	Michael Werner, linux-arm-kernel, Amir Ancel, Netdev,
	David Laight, Suravee Suthikulpanit, Robin Murphy, David Miller,
	h


On 2017/5/9 8:48, Casey Leedom wrote:
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 6, 2017 11:07 AM
> | 
> | | From: Ding Tianhong <dingtianhong@huawei.com>
> | | Date: Fri, May 5, 2017 at 8:08 PM
> | | 
> | | According the suggestion, I could only think of this code:
> | | ..
> | 
> | This is a bit simplistic but it is a start.
> 
>   Yes, something tells me that this is going to be more complicated than any
> of us like ...
> 
> | The other bit I was getting at is that we need to update the core PCIe
> | code so that when we configure devices and the root complex reports no
> | support for relaxed ordering it should be clearing the relaxed
> | ordering bits in the PCIe configuration registers on the upstream
> | facing devices.
> 
>   Of course, this can be written to by the Driver at any time ... and is in
> the case of the cxgb4 Driver ...
> 
>   After a lot of rummaging around, it does look like KVM prohibits writes to
> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
> conf_space_capability.c and conf_space.c simply because writes aren't
> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
> 
> | The last bit we need in all this is a way to allow for setups where
> | peer-to-peer wants to perform relaxed ordering but for writes to the
> | host we have to not use relaxed ordering. For that we need to enable a
> | special case and that isn't handled right now in any of the solutions
> | we have coded up so far.
> 
>   Yes, we do need this.
> 
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 8, 2017 08:22 AM
> |
> | The problem is we need to have something that can be communicated
> | through a VM. Your change doesn't work in that regard. That was why I
> | suggested just updating the code so that we when we initialized PCIe
> | devices what we do is either set or clear the relaxed ordering bit in
> | the PCIe device control register. That way when we direct assign an
> | interface it could know just based on the bits int the PCIe
> | configuration if it could use relaxed ordering or not.
> | 
> | At that point the driver code itself becomes very simple since you
> | could just enable the relaxed ordering by default in the igb/ixgbe
> | driver and if the bit is set or cleared in the PCIe configuration then
> | we are either sending with relaxed ordering requests or not and don't
> | have to try and locate the root complex.
> | 
> | So from the sound of it Casey has a special use case where he doesn't
> | want to send relaxed ordering frames to the root complex, but instead
> | would like to send them to another PCIe device. To do that he needs to
> | have a way to enable the relaxed ordering bit in the PCIe
> | configuration but then not send any to the root complex. Odds are that
> | is something he might be able to just implement in the driver, but is
> | something that may become a more general case in the future. I don't
> | see our change here impacting it as long as we keep the solution
> | generic and mostly confined to when we instantiate the devices as the
> | driver could likely make the decision to change the behavior later.
> 
>   It's not just me.  Intel has said that while RO directed at the Root
> Complex Host Coherent Memory has a performance bug (not Data Corruption),
> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
> interested in hearing what the bug is if we get that much detail.  The very
> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
> good performance.  So this is essentially a bug in the hardware that was
> ~trying~ to implement a performance win.)
> 
>   Meanwhile, I currently only know of a single PCIe End Point which causes
> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
> clear that product is even alive anymore since I haven't been able to get
> any responses from them for several months.
> 
>   What I'm saying is: let's try to architect a solution which doesn't throw
> the baby out with the bath water ...
> 
>   I think that if a Device's Root Complex Port has problems with Relaxed
> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
> Machine since the Device Driver can no longer query the Relaxed Ordering
> Support of the Root Complex Port.  The only down side of this would be if we
> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
> transfers.  But that seems like a hard application to support in any case
> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
> match the actual values.
> 
>   For Devices running in the base OS/Hypervisor, their Drivers can query the
> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
> simple flag within the (struct pci_dev *)->dev_flags would serve for that
> along with a per-Architecture/Platform mechanism for setting it ...
> 
> Casey
> 

I have take a time to talk to our kvm team about how to distinguish the relaxed
ordering in the VM for some vf just like 82599-vf, the probe routine looks like
could work like this:
1) QEMU could emulate the platform by the Vender ID and device ID which could be
   read from the host.
2) The QEMU could create a virtual PCIe dev complex and recognize the PCIe bus address which
   come and detach from the host to the guest.
3) the PCI quirk could enable the Relaxed Ordering by the Vendor ID and Device ID.
4) The VF drivers could read the flag and set to the hw.

So I think we could set the PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED for some special platform
and don't enable by default, if I miss something, please not hesitate to enlighten me :)

--------------------------------------------------------------
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 085fb78..74bcc25 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4664,3 +4664,22 @@ static void quirk_intel_no_flr(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_intel_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_intel_no_flr);
+
+/*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set, so we should disable Relaxed Ordering by default
+ * and only enable it when some devices has mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_enable(struct pci_dev *dev)
+{
+ dev->dev_flags &= ~PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED;
+}
+
+/*
+ * Hisilicon Root Complex could support relaxed ordering which can
+ * improve performance with Upstream Transaction Layer Packets with
+ * Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_HUAWEI, 0x1610, PCI_CLASS_NOT_DEFINED, 8,
+                       quirk_relaxedordering_enable);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 33c2b0b..f7d8d6f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -183,6 +183,8 @@ enum pci_dev_flags {
        PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
        /* Do not use FLR even if device advertises PCI_AF_CAP */
        PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
+ /* Use Relaxed Ordering for TLPs directed at this device */
+ PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED = (__force pci_dev_flags_t) (1 << 11),
 };

 enum pci_irq_reroute_variant {
@@ -2203,6 +2205,20 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
        return false;
 }

+/**
+ * pci_is_dev_relaxed_ordering_enabled - whether device could support Relaxed
+ * Ordering for TLPs directed.
+ * @pdev: PCI device to check
+ *
+ * This function could return the value indicates that whether Relaxed Ordering
+ * Attribute could be used on Transaction Layer Packets destined for the PCIe
+ * End Node.
+ */
+static inline boot pci_is_dev_relaxed_ordering_enabled(struct pci_dev *pdev)
+{
+ return (pdev->dev_flags & PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED) ==
+        PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED;
 /* provide the legacy pci_dma_* API */
 #include <linux/pci-dma-compat.h>

Thanks
Ding

> .
> 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-25 13:35                         ` Ding Tianhong
  0 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-25 13:35 UTC (permalink / raw)
  To: linux-arm-kernel


On 2017/5/9 8:48, Casey Leedom wrote:
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 6, 2017 11:07 AM
> | 
> | | From: Ding Tianhong <dingtianhong@huawei.com>
> | | Date: Fri, May 5, 2017 at 8:08 PM
> | | 
> | | According the suggestion, I could only think of this code:
> | | ..
> | 
> | This is a bit simplistic but it is a start.
> 
>   Yes, something tells me that this is going to be more complicated than any
> of us like ...
> 
> | The other bit I was getting at is that we need to update the core PCIe
> | code so that when we configure devices and the root complex reports no
> | support for relaxed ordering it should be clearing the relaxed
> | ordering bits in the PCIe configuration registers on the upstream
> | facing devices.
> 
>   Of course, this can be written to by the Driver at any time ... and is in
> the case of the cxgb4 Driver ...
> 
>   After a lot of rummaging around, it does look like KVM prohibits writes to
> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
> conf_space_capability.c and conf_space.c simply because writes aren't
> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
> 
> | The last bit we need in all this is a way to allow for setups where
> | peer-to-peer wants to perform relaxed ordering but for writes to the
> | host we have to not use relaxed ordering. For that we need to enable a
> | special case and that isn't handled right now in any of the solutions
> | we have coded up so far.
> 
>   Yes, we do need this.
> 
> 
> | From: Alexander Duyck <alexander.duyck@gmail.com>
> | Date: Saturday, May 8, 2017 08:22 AM
> |
> | The problem is we need to have something that can be communicated
> | through a VM. Your change doesn't work in that regard. That was why I
> | suggested just updating the code so that we when we initialized PCIe
> | devices what we do is either set or clear the relaxed ordering bit in
> | the PCIe device control register. That way when we direct assign an
> | interface it could know just based on the bits int the PCIe
> | configuration if it could use relaxed ordering or not.
> | 
> | At that point the driver code itself becomes very simple since you
> | could just enable the relaxed ordering by default in the igb/ixgbe
> | driver and if the bit is set or cleared in the PCIe configuration then
> | we are either sending with relaxed ordering requests or not and don't
> | have to try and locate the root complex.
> | 
> | So from the sound of it Casey has a special use case where he doesn't
> | want to send relaxed ordering frames to the root complex, but instead
> | would like to send them to another PCIe device. To do that he needs to
> | have a way to enable the relaxed ordering bit in the PCIe
> | configuration but then not send any to the root complex. Odds are that
> | is something he might be able to just implement in the driver, but is
> | something that may become a more general case in the future. I don't
> | see our change here impacting it as long as we keep the solution
> | generic and mostly confined to when we instantiate the devices as the
> | driver could likely make the decision to change the behavior later.
> 
>   It's not just me.  Intel has said that while RO directed at the Root
> Complex Host Coherent Memory has a performance bug (not Data Corruption),
> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
> interested in hearing what the bug is if we get that much detail.  The very
> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
> good performance.  So this is essentially a bug in the hardware that was
> ~trying~ to implement a performance win.)
> 
>   Meanwhile, I currently only know of a single PCIe End Point which causes
> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
> clear that product is even alive anymore since I haven't been able to get
> any responses from them for several months.
> 
>   What I'm saying is: let's try to architect a solution which doesn't throw
> the baby out with the bath water ...
> 
>   I think that if a Device's Root Complex Port has problems with Relaxed
> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
> Machine since the Device Driver can no longer query the Relaxed Ordering
> Support of the Root Complex Port.  The only down side of this would be if we
> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
> transfers.  But that seems like a hard application to support in any case
> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
> match the actual values.
> 
>   For Devices running in the base OS/Hypervisor, their Drivers can query the
> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
> simple flag within the (struct pci_dev *)->dev_flags would serve for that
> along with a per-Architecture/Platform mechanism for setting it ...
> 
> Casey
> 

I have take a time to talk to our kvm team about how to distinguish the relaxed
ordering in the VM for some vf just like 82599-vf, the probe routine looks like
could work like this:
1) QEMU could emulate the platform by the Vender ID and device ID which could be
   read from the host.
2) The QEMU could create a virtual PCIe dev complex and recognize the PCIe bus address which
   come and detach from the host to the guest.
3) the PCI quirk could enable the Relaxed Ordering by the Vendor ID and Device ID.
4) The VF drivers could read the flag and set to the hw.

So I think we could set the PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED for some special platform
and don't enable by default, if I miss something, please not hesitate to enlighten me :)

--------------------------------------------------------------
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 085fb78..74bcc25 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4664,3 +4664,22 @@ static void quirk_intel_no_flr(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_intel_no_flr);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_intel_no_flr);
+
+/*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set, so we should disable Relaxed Ordering by default
+ * and only enable it when some devices has mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_enable(struct pci_dev *dev)
+{
+ dev->dev_flags &= ~PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED;
+}
+
+/*
+ * Hisilicon Root Complex could support relaxed ordering which can
+ * improve performance with Upstream Transaction Layer Packets with
+ * Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_HUAWEI, 0x1610, PCI_CLASS_NOT_DEFINED, 8,
+                       quirk_relaxedordering_enable);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 33c2b0b..f7d8d6f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -183,6 +183,8 @@ enum pci_dev_flags {
        PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
        /* Do not use FLR even if device advertises PCI_AF_CAP */
        PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
+ /* Use Relaxed Ordering for TLPs directed@this device */
+ PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED = (__force pci_dev_flags_t) (1 << 11),
 };

 enum pci_irq_reroute_variant {
@@ -2203,6 +2205,20 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
        return false;
 }

+/**
+ * pci_is_dev_relaxed_ordering_enabled - whether device could support Relaxed
+ * Ordering for TLPs directed.
+ * @pdev: PCI device to check
+ *
+ * This function could return the value indicates that whether Relaxed Ordering
+ * Attribute could be used on Transaction Layer Packets destined for the PCIe
+ * End Node.
+ */
+static inline boot pci_is_dev_relaxed_ordering_enabled(struct pci_dev *pdev)
+{
+ return (pdev->dev_flags & PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED) ==
+        PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED;
 /* provide the legacy pci_dma_* API */
 #include <linux/pci-dma-compat.h>

Thanks
Ding

> .
> 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-25 13:35                         ` Ding Tianhong
@ 2017-05-25 19:49                           ` Alexander Duyck
  -1 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-25 19:49 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Casey Leedom, Raj, Ashok, Bjorn Helgaas, Michael Werner,
	Ganesh GR, Arjun V.,
	Asit K Mallick, Patrick J Cramer, Suravee Suthikulpanit,
	Bob Shaw, h, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	Catalin Marinas, Will Deacon, LinuxArm, David Laight,
	Jeff Kirsher, Netdev

On Thu, May 25, 2017 at 6:35 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
> On 2017/5/9 8:48, Casey Leedom wrote:
>>
>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>> | Date: Saturday, May 6, 2017 11:07 AM
>> |
>> | | From: Ding Tianhong <dingtianhong@huawei.com>
>> | | Date: Fri, May 5, 2017 at 8:08 PM
>> | |
>> | | According the suggestion, I could only think of this code:
>> | | ..
>> |
>> | This is a bit simplistic but it is a start.
>>
>>   Yes, something tells me that this is going to be more complicated than any
>> of us like ...
>>
>> | The other bit I was getting at is that we need to update the core PCIe
>> | code so that when we configure devices and the root complex reports no
>> | support for relaxed ordering it should be clearing the relaxed
>> | ordering bits in the PCIe configuration registers on the upstream
>> | facing devices.
>>
>>   Of course, this can be written to by the Driver at any time ... and is in
>> the case of the cxgb4 Driver ...
>>
>>   After a lot of rummaging around, it does look like KVM prohibits writes to
>> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
>> conf_space_capability.c and conf_space.c simply because writes aren't
>> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
>> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
>>
>> | The last bit we need in all this is a way to allow for setups where
>> | peer-to-peer wants to perform relaxed ordering but for writes to the
>> | host we have to not use relaxed ordering. For that we need to enable a
>> | special case and that isn't handled right now in any of the solutions
>> | we have coded up so far.
>>
>>   Yes, we do need this.
>>
>>
>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>> | Date: Saturday, May 8, 2017 08:22 AM
>> |
>> | The problem is we need to have something that can be communicated
>> | through a VM. Your change doesn't work in that regard. That was why I
>> | suggested just updating the code so that we when we initialized PCIe
>> | devices what we do is either set or clear the relaxed ordering bit in
>> | the PCIe device control register. That way when we direct assign an
>> | interface it could know just based on the bits int the PCIe
>> | configuration if it could use relaxed ordering or not.
>> |
>> | At that point the driver code itself becomes very simple since you
>> | could just enable the relaxed ordering by default in the igb/ixgbe
>> | driver and if the bit is set or cleared in the PCIe configuration then
>> | we are either sending with relaxed ordering requests or not and don't
>> | have to try and locate the root complex.
>> |
>> | So from the sound of it Casey has a special use case where he doesn't
>> | want to send relaxed ordering frames to the root complex, but instead
>> | would like to send them to another PCIe device. To do that he needs to
>> | have a way to enable the relaxed ordering bit in the PCIe
>> | configuration but then not send any to the root complex. Odds are that
>> | is something he might be able to just implement in the driver, but is
>> | something that may become a more general case in the future. I don't
>> | see our change here impacting it as long as we keep the solution
>> | generic and mostly confined to when we instantiate the devices as the
>> | driver could likely make the decision to change the behavior later.
>>
>>   It's not just me.  Intel has said that while RO directed at the Root
>> Complex Host Coherent Memory has a performance bug (not Data Corruption),
>> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
>> interested in hearing what the bug is if we get that much detail.  The very
>> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
>> good performance.  So this is essentially a bug in the hardware that was
>> ~trying~ to implement a performance win.)
>>
>>   Meanwhile, I currently only know of a single PCIe End Point which causes
>> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
>> clear that product is even alive anymore since I haven't been able to get
>> any responses from them for several months.
>>
>>   What I'm saying is: let's try to architect a solution which doesn't throw
>> the baby out with the bath water ...
>>
>>   I think that if a Device's Root Complex Port has problems with Relaxed
>> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
>> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
>> Machine since the Device Driver can no longer query the Relaxed Ordering
>> Support of the Root Complex Port.  The only down side of this would be if we
>> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
>> transfers.  But that seems like a hard application to support in any case
>> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
>> match the actual values.
>>
>>   For Devices running in the base OS/Hypervisor, their Drivers can query the
>> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
>> simple flag within the (struct pci_dev *)->dev_flags would serve for that
>> along with a per-Architecture/Platform mechanism for setting it ...
>>
>> Casey
>>
>
> I have take a time to talk to our kvm team about how to distinguish the relaxed
> ordering in the VM for some vf just like 82599-vf, the probe routine looks like
> could work like this:
> 1) QEMU could emulate the platform by the Vender ID and device ID which could be
>    read from the host.
> 2) The QEMU could create a virtual PCIe dev complex and recognize the PCIe bus address which
>    come and detach from the host to the guest.
> 3) the PCI quirk could enable the Relaxed Ordering by the Vendor ID and Device ID.
> 4) The VF drivers could read the flag and set to the hw.
>
> So I think we could set the PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED for some special platform
> and don't enable by default, if I miss something, please not hesitate to enlighten me :)

This isn't what I had in mind at all.

So what Casey had originally submitted was a step in the direction of
what I was thinking. Basically on platforms where it is not advisable
to enable relaxed ordering we need to advertise that relaxed ordering
is not safe. Then when we are initializing the devices underneath
those we need to be clearing the relaxed ordering enable bits in the
PCI configuration, that is the piece that was missing from Casey's
original patch. In addition we then need to have a way for devices to
optionally enable relaxed ordering for cases like Casey pointed out
where they might want to use relaxed ordering for peer-to-peer
transactions, but not for transactions to the root complex. Finally in
the case of the Intel drivers we could then just drop the compile time
checks entirely and just enable the device to configure the internal
registers for relaxed ordering because the configuration space becomes
the spot that controls if this gets enabled or not.

So the initial set of patches Casey submitted only really consisted of
2 patches. What I am proposing is that we would be looking at
expanding this out to about 4 patches. The first patch is the original
1 of 2, the second patch would be to modify the PCI initialization
code to clear the relaxed ordering enable bit in the event that the
root complex doesn't want relaxed ordering enabled, the third would be
to make changes to the Chelsio driver as needed to allow for the
peer-to-peer case to be enabled when the relaxed ordering bit in the
configuration space is not enabled without triggering any relaxed
ordering requests to the root complex, and the last one would be to
drop the defines in ixgbe and whatever other Intel drivers are
currently checking for either SPARC or the define that was added to
support relaxed ordering and just act like we are going to do it
always with the PCI configuration space controlling if we do or not.

Ideally as a part of the second patch we should have a way of testing
if a given path can support relaxed ordering. That way when we go to
try to enable a peer-to-peer setup we can be certain that a given path
will work and don't try enabling it in paths that would be unsupported
for peer-to-peer.

This ends up being somewhat of a risk for the Intel NICs, but if the
Chelsio devices have been running with relaxed ordering enabled for
some time and have identified the chipsets that should be issues, then
odds are we should be fine as well.

> --------------------------------------------------------------
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 085fb78..74bcc25 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4664,3 +4664,22 @@ static void quirk_intel_no_flr(struct pci_dev *dev)
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_intel_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_intel_no_flr);
> +
> +/*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set, so we should disable Relaxed Ordering by default
> + * and only enable it when some devices has mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_enable(struct pci_dev *dev)
> +{
> + dev->dev_flags &= ~PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED;
> +}
> +
> +/*
> + * Hisilicon Root Complex could support relaxed ordering which can
> + * improve performance with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_HUAWEI, 0x1610, PCI_CLASS_NOT_DEFINED, 8,
> +                       quirk_relaxedordering_enable);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 33c2b0b..f7d8d6f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -183,6 +183,8 @@ enum pci_dev_flags {
>         PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
>         /* Do not use FLR even if device advertises PCI_AF_CAP */
>         PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
> + /* Use Relaxed Ordering for TLPs directed at this device */
> + PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED = (__force pci_dev_flags_t) (1 << 11),
>  };
>
>  enum pci_irq_reroute_variant {
> @@ -2203,6 +2205,20 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
>         return false;
>  }
>
> +/**
> + * pci_is_dev_relaxed_ordering_enabled - whether device could support Relaxed
> + * Ordering for TLPs directed.
> + * @pdev: PCI device to check
> + *
> + * This function could return the value indicates that whether Relaxed Ordering
> + * Attribute could be used on Transaction Layer Packets destined for the PCIe
> + * End Node.
> + */
> +static inline boot pci_is_dev_relaxed_ordering_enabled(struct pci_dev *pdev)
> +{
> + return (pdev->dev_flags & PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED) ==
> +        PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED;
>  /* provide the legacy pci_dma_* API */
>  #include <linux/pci-dma-compat.h>
>
> Thanks
> Ding
>
>> .
>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-25 19:49                           ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2017-05-25 19:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 25, 2017 at 6:35 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>
> On 2017/5/9 8:48, Casey Leedom wrote:
>>
>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>> | Date: Saturday, May 6, 2017 11:07 AM
>> |
>> | | From: Ding Tianhong <dingtianhong@huawei.com>
>> | | Date: Fri, May 5, 2017 at 8:08 PM
>> | |
>> | | According the suggestion, I could only think of this code:
>> | | ..
>> |
>> | This is a bit simplistic but it is a start.
>>
>>   Yes, something tells me that this is going to be more complicated than any
>> of us like ...
>>
>> | The other bit I was getting at is that we need to update the core PCIe
>> | code so that when we configure devices and the root complex reports no
>> | support for relaxed ordering it should be clearing the relaxed
>> | ordering bits in the PCIe configuration registers on the upstream
>> | facing devices.
>>
>>   Of course, this can be written to by the Driver at any time ... and is in
>> the case of the cxgb4 Driver ...
>>
>>   After a lot of rummaging around, it does look like KVM prohibits writes to
>> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
>> conf_space_capability.c and conf_space.c simply because writes aren't
>> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
>> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
>>
>> | The last bit we need in all this is a way to allow for setups where
>> | peer-to-peer wants to perform relaxed ordering but for writes to the
>> | host we have to not use relaxed ordering. For that we need to enable a
>> | special case and that isn't handled right now in any of the solutions
>> | we have coded up so far.
>>
>>   Yes, we do need this.
>>
>>
>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>> | Date: Saturday, May 8, 2017 08:22 AM
>> |
>> | The problem is we need to have something that can be communicated
>> | through a VM. Your change doesn't work in that regard. That was why I
>> | suggested just updating the code so that we when we initialized PCIe
>> | devices what we do is either set or clear the relaxed ordering bit in
>> | the PCIe device control register. That way when we direct assign an
>> | interface it could know just based on the bits int the PCIe
>> | configuration if it could use relaxed ordering or not.
>> |
>> | At that point the driver code itself becomes very simple since you
>> | could just enable the relaxed ordering by default in the igb/ixgbe
>> | driver and if the bit is set or cleared in the PCIe configuration then
>> | we are either sending with relaxed ordering requests or not and don't
>> | have to try and locate the root complex.
>> |
>> | So from the sound of it Casey has a special use case where he doesn't
>> | want to send relaxed ordering frames to the root complex, but instead
>> | would like to send them to another PCIe device. To do that he needs to
>> | have a way to enable the relaxed ordering bit in the PCIe
>> | configuration but then not send any to the root complex. Odds are that
>> | is something he might be able to just implement in the driver, but is
>> | something that may become a more general case in the future. I don't
>> | see our change here impacting it as long as we keep the solution
>> | generic and mostly confined to when we instantiate the devices as the
>> | driver could likely make the decision to change the behavior later.
>>
>>   It's not just me.  Intel has said that while RO directed at the Root
>> Complex Host Coherent Memory has a performance bug (not Data Corruption),
>> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
>> interested in hearing what the bug is if we get that much detail.  The very
>> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
>> good performance.  So this is essentially a bug in the hardware that was
>> ~trying~ to implement a performance win.)
>>
>>   Meanwhile, I currently only know of a single PCIe End Point which causes
>> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
>> clear that product is even alive anymore since I haven't been able to get
>> any responses from them for several months.
>>
>>   What I'm saying is: let's try to architect a solution which doesn't throw
>> the baby out with the bath water ...
>>
>>   I think that if a Device's Root Complex Port has problems with Relaxed
>> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
>> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
>> Machine since the Device Driver can no longer query the Relaxed Ordering
>> Support of the Root Complex Port.  The only down side of this would be if we
>> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
>> transfers.  But that seems like a hard application to support in any case
>> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
>> match the actual values.
>>
>>   For Devices running in the base OS/Hypervisor, their Drivers can query the
>> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
>> simple flag within the (struct pci_dev *)->dev_flags would serve for that
>> along with a per-Architecture/Platform mechanism for setting it ...
>>
>> Casey
>>
>
> I have take a time to talk to our kvm team about how to distinguish the relaxed
> ordering in the VM for some vf just like 82599-vf, the probe routine looks like
> could work like this:
> 1) QEMU could emulate the platform by the Vender ID and device ID which could be
>    read from the host.
> 2) The QEMU could create a virtual PCIe dev complex and recognize the PCIe bus address which
>    come and detach from the host to the guest.
> 3) the PCI quirk could enable the Relaxed Ordering by the Vendor ID and Device ID.
> 4) The VF drivers could read the flag and set to the hw.
>
> So I think we could set the PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED for some special platform
> and don't enable by default, if I miss something, please not hesitate to enlighten me :)

This isn't what I had in mind at all.

So what Casey had originally submitted was a step in the direction of
what I was thinking. Basically on platforms where it is not advisable
to enable relaxed ordering we need to advertise that relaxed ordering
is not safe. Then when we are initializing the devices underneath
those we need to be clearing the relaxed ordering enable bits in the
PCI configuration, that is the piece that was missing from Casey's
original patch. In addition we then need to have a way for devices to
optionally enable relaxed ordering for cases like Casey pointed out
where they might want to use relaxed ordering for peer-to-peer
transactions, but not for transactions to the root complex. Finally in
the case of the Intel drivers we could then just drop the compile time
checks entirely and just enable the device to configure the internal
registers for relaxed ordering because the configuration space becomes
the spot that controls if this gets enabled or not.

So the initial set of patches Casey submitted only really consisted of
2 patches. What I am proposing is that we would be looking at
expanding this out to about 4 patches. The first patch is the original
1 of 2, the second patch would be to modify the PCI initialization
code to clear the relaxed ordering enable bit in the event that the
root complex doesn't want relaxed ordering enabled, the third would be
to make changes to the Chelsio driver as needed to allow for the
peer-to-peer case to be enabled when the relaxed ordering bit in the
configuration space is not enabled without triggering any relaxed
ordering requests to the root complex, and the last one would be to
drop the defines in ixgbe and whatever other Intel drivers are
currently checking for either SPARC or the define that was added to
support relaxed ordering and just act like we are going to do it
always with the PCI configuration space controlling if we do or not.

Ideally as a part of the second patch we should have a way of testing
if a given path can support relaxed ordering. That way when we go to
try to enable a peer-to-peer setup we can be certain that a given path
will work and don't try enabling it in paths that would be unsupported
for peer-to-peer.

This ends up being somewhat of a risk for the Intel NICs, but if the
Chelsio devices have been running with relaxed ordering enabled for
some time and have identified the chipsets that should be issues, then
odds are we should be fine as well.

> --------------------------------------------------------------
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 085fb78..74bcc25 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4664,3 +4664,22 @@ static void quirk_intel_no_flr(struct pci_dev *dev)
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_intel_no_flr);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_intel_no_flr);
> +
> +/*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set, so we should disable Relaxed Ordering by default
> + * and only enable it when some devices has mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_enable(struct pci_dev *dev)
> +{
> + dev->dev_flags &= ~PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED;
> +}
> +
> +/*
> + * Hisilicon Root Complex could support relaxed ordering which can
> + * improve performance with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_HUAWEI, 0x1610, PCI_CLASS_NOT_DEFINED, 8,
> +                       quirk_relaxedordering_enable);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 33c2b0b..f7d8d6f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -183,6 +183,8 @@ enum pci_dev_flags {
>         PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
>         /* Do not use FLR even if device advertises PCI_AF_CAP */
>         PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
> + /* Use Relaxed Ordering for TLPs directed at this device */
> + PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED = (__force pci_dev_flags_t) (1 << 11),
>  };
>
>  enum pci_irq_reroute_variant {
> @@ -2203,6 +2205,20 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
>         return false;
>  }
>
> +/**
> + * pci_is_dev_relaxed_ordering_enabled - whether device could support Relaxed
> + * Ordering for TLPs directed.
> + * @pdev: PCI device to check
> + *
> + * This function could return the value indicates that whether Relaxed Ordering
> + * Attribute could be used on Transaction Layer Packets destined for the PCIe
> + * End Node.
> + */
> +static inline boot pci_is_dev_relaxed_ordering_enabled(struct pci_dev *pdev)
> +{
> + return (pdev->dev_flags & PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED) ==
> +        PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED;
>  /* provide the legacy pci_dma_* API */
>  #include <linux/pci-dma-compat.h>
>
> Thanks
> Ding
>
>> .
>>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  2017-05-25 19:49                           ` Alexander Duyck
@ 2017-05-27 10:34                             ` Ding Tianhong
  -1 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-27 10:34 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Mark Rutland, Gabriele Paoloni, Asit K Mallick, Catalin Marinas,
	Will Deacon, LinuxArm, Raj, Ashok, Bjorn Helgaas, Ganesh GR,
	Jeff Kirsher, Bob Shaw, Casey Leedom, Patrick J Cramer, Arjun V.,
	Michael Werner, linux-arm-kernel, Amir Ancel, Netdev,
	David Laight, Suravee Suthikulpanit, Robin Murphy, David Miller,
	h



On 2017/5/26 3:49, Alexander Duyck wrote:
> On Thu, May 25, 2017 at 6:35 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>
>> On 2017/5/9 8:48, Casey Leedom wrote:
>>>
>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>> | Date: Saturday, May 6, 2017 11:07 AM
>>> |
>>> | | From: Ding Tianhong <dingtianhong@huawei.com>
>>> | | Date: Fri, May 5, 2017 at 8:08 PM
>>> | |
>>> | | According the suggestion, I could only think of this code:
>>> | | ..
>>> |
>>> | This is a bit simplistic but it is a start.
>>>
>>>   Yes, something tells me that this is going to be more complicated than any
>>> of us like ...
>>>
>>> | The other bit I was getting at is that we need to update the core PCIe
>>> | code so that when we configure devices and the root complex reports no
>>> | support for relaxed ordering it should be clearing the relaxed
>>> | ordering bits in the PCIe configuration registers on the upstream
>>> | facing devices.
>>>
>>>   Of course, this can be written to by the Driver at any time ... and is in
>>> the case of the cxgb4 Driver ...
>>>
>>>   After a lot of rummaging around, it does look like KVM prohibits writes to
>>> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
>>> conf_space_capability.c and conf_space.c simply because writes aren't
>>> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
>>> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
>>>
>>> | The last bit we need in all this is a way to allow for setups where
>>> | peer-to-peer wants to perform relaxed ordering but for writes to the
>>> | host we have to not use relaxed ordering. For that we need to enable a
>>> | special case and that isn't handled right now in any of the solutions
>>> | we have coded up so far.
>>>
>>>   Yes, we do need this.
>>>
>>>
>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>> | Date: Saturday, May 8, 2017 08:22 AM
>>> |
>>> | The problem is we need to have something that can be communicated
>>> | through a VM. Your change doesn't work in that regard. That was why I
>>> | suggested just updating the code so that we when we initialized PCIe
>>> | devices what we do is either set or clear the relaxed ordering bit in
>>> | the PCIe device control register. That way when we direct assign an
>>> | interface it could know just based on the bits int the PCIe
>>> | configuration if it could use relaxed ordering or not.
>>> |
>>> | At that point the driver code itself becomes very simple since you
>>> | could just enable the relaxed ordering by default in the igb/ixgbe
>>> | driver and if the bit is set or cleared in the PCIe configuration then
>>> | we are either sending with relaxed ordering requests or not and don't
>>> | have to try and locate the root complex.
>>> |
>>> | So from the sound of it Casey has a special use case where he doesn't
>>> | want to send relaxed ordering frames to the root complex, but instead
>>> | would like to send them to another PCIe device. To do that he needs to
>>> | have a way to enable the relaxed ordering bit in the PCIe
>>> | configuration but then not send any to the root complex. Odds are that
>>> | is something he might be able to just implement in the driver, but is
>>> | something that may become a more general case in the future. I don't
>>> | see our change here impacting it as long as we keep the solution
>>> | generic and mostly confined to when we instantiate the devices as the
>>> | driver could likely make the decision to change the behavior later.
>>>
>>>   It's not just me.  Intel has said that while RO directed at the Root
>>> Complex Host Coherent Memory has a performance bug (not Data Corruption),
>>> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
>>> interested in hearing what the bug is if we get that much detail.  The very
>>> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
>>> good performance.  So this is essentially a bug in the hardware that was
>>> ~trying~ to implement a performance win.)
>>>
>>>   Meanwhile, I currently only know of a single PCIe End Point which causes
>>> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
>>> clear that product is even alive anymore since I haven't been able to get
>>> any responses from them for several months.
>>>
>>>   What I'm saying is: let's try to architect a solution which doesn't throw
>>> the baby out with the bath water ...
>>>
>>>   I think that if a Device's Root Complex Port has problems with Relaxed
>>> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
>>> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
>>> Machine since the Device Driver can no longer query the Relaxed Ordering
>>> Support of the Root Complex Port.  The only down side of this would be if we
>>> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
>>> transfers.  But that seems like a hard application to support in any case
>>> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
>>> match the actual values.
>>>
>>>   For Devices running in the base OS/Hypervisor, their Drivers can query the
>>> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
>>> simple flag within the (struct pci_dev *)->dev_flags would serve for that
>>> along with a per-Architecture/Platform mechanism for setting it ...
>>>
>>> Casey
>>>
>>
>> I have take a time to talk to our kvm team about how to distinguish the relaxed
>> ordering in the VM for some vf just like 82599-vf, the probe routine looks like
>> could work like this:
>> 1) QEMU could emulate the platform by the Vender ID and device ID which could be
>>    read from the host.
>> 2) The QEMU could create a virtual PCIe dev complex and recognize the PCIe bus address which
>>    come and detach from the host to the guest.
>> 3) the PCI quirk could enable the Relaxed Ordering by the Vendor ID and Device ID.
>> 4) The VF drivers could read the flag and set to the hw.
>>
>> So I think we could set the PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED for some special platform
>> and don't enable by default, if I miss something, please not hesitate to enlighten me :)
> 
> This isn't what I had in mind at all.
> 
> So what Casey had originally submitted was a step in the direction of
> what I was thinking. Basically on platforms where it is not advisable
> to enable relaxed ordering we need to advertise that relaxed ordering
> is not safe. Then when we are initializing the devices underneath
> those we need to be clearing the relaxed ordering enable bits in the
> PCI configuration, that is the piece that was missing from Casey's
> original patch. In addition we then need to have a way for devices to
> optionally enable relaxed ordering for cases like Casey pointed out
> where they might want to use relaxed ordering for peer-to-peer
> transactions, but not for transactions to the root complex. Finally in
> the case of the Intel drivers we could then just drop the compile time
> checks entirely and just enable the device to configure the internal
> registers for relaxed ordering because the configuration space becomes
> the spot that controls if this gets enabled or not.
> 
> So the initial set of patches Casey submitted only really consisted of
> 2 patches. What I am proposing is that we would be looking at
> expanding this out to about 4 patches. The first patch is the original
> 1 of 2, the second patch would be to modify the PCI initialization
> code to clear the relaxed ordering enable bit in the event that the
> root complex doesn't want relaxed ordering enabled, the third would be
> to make changes to the Chelsio driver as needed to allow for the
> peer-to-peer case to be enabled when the relaxed ordering bit in the
> configuration space is not enabled without triggering any relaxed
> ordering requests to the root complex, and the last one would be to
> drop the defines in ixgbe and whatever other Intel drivers are
> currently checking for either SPARC or the define that was added to
> support relaxed ordering and just act like we are going to do it
> always with the PCI configuration space controlling if we do or not.
> 
> Ideally as a part of the second patch we should have a way of testing
> if a given path can support relaxed ordering. That way when we go to
> try to enable a peer-to-peer setup we can be certain that a given path
> will work and don't try enabling it in paths that would be unsupported
> for peer-to-peer.
> 
> This ends up being somewhat of a risk for the Intel NICs, but if the
> Chelsio devices have been running with relaxed ordering enabled for
> some time and have identified the chipsets that should be issues, then
> odds are we should be fine as well.
> 
According to your opinion, I try to build the second patch again,
1. in the pcie probe time, the pci_configure_relaxed_ordering will used to set the relaxed ordering
bit for configuration space.
2. export the pcie_get_relaxed_ordering for drivers and the drivers could decide whether should enable the relaxed ordering.

If I still go to the wrong path, please correct me, thanks.

---------------------------------------------------------------
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b01bd5b..0076e4a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4852,6 +4852,28 @@ int pcie_get_mps(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pcie_get_mps);

+int pcie_set_relaxed_ordering(struct pci_dev *dev)
+{
+ return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_set_relaxed_ordering);
+
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+ return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+int pcie_get_relaxed_ordering(struct pci_dev *dev)
+{
+ u16 v;
+
+ pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
+
+ return (v & PCI_EXP_DEVCTL_RELAX_EN) >> 4;
+}
+EXPORT_SYMBOL(pcie_get_relaxed_ordering);
+
 /**
  * pcie_set_mps - set PCI Express maximum payload size
  * @dev: PCI device to query
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 19c8950..aeb22b5 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1701,6 +1701,16 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
                                         PCI_EXP_DEVCTL_EXT_TAG);
 }

+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+ int ret;
+
+ if (dev && (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))
+         pcie_set_relaxed_ordering(dev);
+ else
+         pcie_clear_relaxed_ordering(dev);
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
        struct hotplug_params hpp;
@@ -1708,6 +1718,7 @@ static void pci_configure_device(struct pci_dev *dev)

        pci_configure_mps(dev);
        pci_configure_extended_tags(dev);
+ pci_configure_relaxed_ordering(dev);

        memset(&hpp, 0, sizeof(hpp));
        ret = pci_get_hp_params(dev, &hpp);


Thanks
Ding


>>> .
>>>
>>
> 
> .
> 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
@ 2017-05-27 10:34                             ` Ding Tianhong
  0 siblings, 0 replies; 50+ messages in thread
From: Ding Tianhong @ 2017-05-27 10:34 UTC (permalink / raw)
  To: linux-arm-kernel



On 2017/5/26 3:49, Alexander Duyck wrote:
> On Thu, May 25, 2017 at 6:35 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>>
>> On 2017/5/9 8:48, Casey Leedom wrote:
>>>
>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>> | Date: Saturday, May 6, 2017 11:07 AM
>>> |
>>> | | From: Ding Tianhong <dingtianhong@huawei.com>
>>> | | Date: Fri, May 5, 2017 at 8:08 PM
>>> | |
>>> | | According the suggestion, I could only think of this code:
>>> | | ..
>>> |
>>> | This is a bit simplistic but it is a start.
>>>
>>>   Yes, something tells me that this is going to be more complicated than any
>>> of us like ...
>>>
>>> | The other bit I was getting at is that we need to update the core PCIe
>>> | code so that when we configure devices and the root complex reports no
>>> | support for relaxed ordering it should be clearing the relaxed
>>> | ordering bits in the PCIe configuration registers on the upstream
>>> | facing devices.
>>>
>>>   Of course, this can be written to by the Driver at any time ... and is in
>>> the case of the cxgb4 Driver ...
>>>
>>>   After a lot of rummaging around, it does look like KVM prohibits writes to
>>> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
>>> conf_space_capability.c and conf_space.c simply because writes aren't
>>> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
>>> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
>>>
>>> | The last bit we need in all this is a way to allow for setups where
>>> | peer-to-peer wants to perform relaxed ordering but for writes to the
>>> | host we have to not use relaxed ordering. For that we need to enable a
>>> | special case and that isn't handled right now in any of the solutions
>>> | we have coded up so far.
>>>
>>>   Yes, we do need this.
>>>
>>>
>>> | From: Alexander Duyck <alexander.duyck@gmail.com>
>>> | Date: Saturday, May 8, 2017 08:22 AM
>>> |
>>> | The problem is we need to have something that can be communicated
>>> | through a VM. Your change doesn't work in that regard. That was why I
>>> | suggested just updating the code so that we when we initialized PCIe
>>> | devices what we do is either set or clear the relaxed ordering bit in
>>> | the PCIe device control register. That way when we direct assign an
>>> | interface it could know just based on the bits int the PCIe
>>> | configuration if it could use relaxed ordering or not.
>>> |
>>> | At that point the driver code itself becomes very simple since you
>>> | could just enable the relaxed ordering by default in the igb/ixgbe
>>> | driver and if the bit is set or cleared in the PCIe configuration then
>>> | we are either sending with relaxed ordering requests or not and don't
>>> | have to try and locate the root complex.
>>> |
>>> | So from the sound of it Casey has a special use case where he doesn't
>>> | want to send relaxed ordering frames to the root complex, but instead
>>> | would like to send them to another PCIe device. To do that he needs to
>>> | have a way to enable the relaxed ordering bit in the PCIe
>>> | configuration but then not send any to the root complex. Odds are that
>>> | is something he might be able to just implement in the driver, but is
>>> | something that may become a more general case in the future. I don't
>>> | see our change here impacting it as long as we keep the solution
>>> | generic and mostly confined to when we instantiate the devices as the
>>> | driver could likely make the decision to change the behavior later.
>>>
>>>   It's not just me.  Intel has said that while RO directed at the Root
>>> Complex Host Coherent Memory has a performance bug (not Data Corruption),
>>> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
>>> interested in hearing what the bug is if we get that much detail.  The very
>>> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
>>> good performance.  So this is essentially a bug in the hardware that was
>>> ~trying~ to implement a performance win.)
>>>
>>>   Meanwhile, I currently only know of a single PCIe End Point which causes
>>> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
>>> clear that product is even alive anymore since I haven't been able to get
>>> any responses from them for several months.
>>>
>>>   What I'm saying is: let's try to architect a solution which doesn't throw
>>> the baby out with the bath water ...
>>>
>>>   I think that if a Device's Root Complex Port has problems with Relaxed
>>> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
>>> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
>>> Machine since the Device Driver can no longer query the Relaxed Ordering
>>> Support of the Root Complex Port.  The only down side of this would be if we
>>> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
>>> transfers.  But that seems like a hard application to support in any case
>>> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
>>> match the actual values.
>>>
>>>   For Devices running in the base OS/Hypervisor, their Drivers can query the
>>> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
>>> simple flag within the (struct pci_dev *)->dev_flags would serve for that
>>> along with a per-Architecture/Platform mechanism for setting it ...
>>>
>>> Casey
>>>
>>
>> I have take a time to talk to our kvm team about how to distinguish the relaxed
>> ordering in the VM for some vf just like 82599-vf, the probe routine looks like
>> could work like this:
>> 1) QEMU could emulate the platform by the Vender ID and device ID which could be
>>    read from the host.
>> 2) The QEMU could create a virtual PCIe dev complex and recognize the PCIe bus address which
>>    come and detach from the host to the guest.
>> 3) the PCI quirk could enable the Relaxed Ordering by the Vendor ID and Device ID.
>> 4) The VF drivers could read the flag and set to the hw.
>>
>> So I think we could set the PCI_DEV_FLAGS_RELAXED_ORDERING_ENABLED for some special platform
>> and don't enable by default, if I miss something, please not hesitate to enlighten me :)
> 
> This isn't what I had in mind at all.
> 
> So what Casey had originally submitted was a step in the direction of
> what I was thinking. Basically on platforms where it is not advisable
> to enable relaxed ordering we need to advertise that relaxed ordering
> is not safe. Then when we are initializing the devices underneath
> those we need to be clearing the relaxed ordering enable bits in the
> PCI configuration, that is the piece that was missing from Casey's
> original patch. In addition we then need to have a way for devices to
> optionally enable relaxed ordering for cases like Casey pointed out
> where they might want to use relaxed ordering for peer-to-peer
> transactions, but not for transactions to the root complex. Finally in
> the case of the Intel drivers we could then just drop the compile time
> checks entirely and just enable the device to configure the internal
> registers for relaxed ordering because the configuration space becomes
> the spot that controls if this gets enabled or not.
> 
> So the initial set of patches Casey submitted only really consisted of
> 2 patches. What I am proposing is that we would be looking at
> expanding this out to about 4 patches. The first patch is the original
> 1 of 2, the second patch would be to modify the PCI initialization
> code to clear the relaxed ordering enable bit in the event that the
> root complex doesn't want relaxed ordering enabled, the third would be
> to make changes to the Chelsio driver as needed to allow for the
> peer-to-peer case to be enabled when the relaxed ordering bit in the
> configuration space is not enabled without triggering any relaxed
> ordering requests to the root complex, and the last one would be to
> drop the defines in ixgbe and whatever other Intel drivers are
> currently checking for either SPARC or the define that was added to
> support relaxed ordering and just act like we are going to do it
> always with the PCI configuration space controlling if we do or not.
> 
> Ideally as a part of the second patch we should have a way of testing
> if a given path can support relaxed ordering. That way when we go to
> try to enable a peer-to-peer setup we can be certain that a given path
> will work and don't try enabling it in paths that would be unsupported
> for peer-to-peer.
> 
> This ends up being somewhat of a risk for the Intel NICs, but if the
> Chelsio devices have been running with relaxed ordering enabled for
> some time and have identified the chipsets that should be issues, then
> odds are we should be fine as well.
> 
According to your opinion, I try to build the second patch again,
1. in the pcie probe time, the pci_configure_relaxed_ordering will used to set the relaxed ordering
bit for configuration space.
2. export the pcie_get_relaxed_ordering for drivers and the drivers could decide whether should enable the relaxed ordering.

If I still go to the wrong path, please correct me, thanks.

---------------------------------------------------------------
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b01bd5b..0076e4a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4852,6 +4852,28 @@ int pcie_get_mps(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pcie_get_mps);

+int pcie_set_relaxed_ordering(struct pci_dev *dev)
+{
+ return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_set_relaxed_ordering);
+
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+ return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+int pcie_get_relaxed_ordering(struct pci_dev *dev)
+{
+ u16 v;
+
+ pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
+
+ return (v & PCI_EXP_DEVCTL_RELAX_EN) >> 4;
+}
+EXPORT_SYMBOL(pcie_get_relaxed_ordering);
+
 /**
  * pcie_set_mps - set PCI Express maximum payload size
  * @dev: PCI device to query
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 19c8950..aeb22b5 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1701,6 +1701,16 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
                                         PCI_EXP_DEVCTL_EXT_TAG);
 }

+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+ int ret;
+
+ if (dev && (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))
+         pcie_set_relaxed_ordering(dev);
+ else
+         pcie_clear_relaxed_ordering(dev);
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
        struct hotplug_params hpp;
@@ -1708,6 +1718,7 @@ static void pci_configure_device(struct pci_dev *dev)

        pci_configure_mps(dev);
        pci_configure_extended_tags(dev);
+ pci_configure_relaxed_ordering(dev);

        memset(&hpp, 0, sizeof(hpp));
        ret = pci_get_hp_params(dev, &hpp);


Thanks
Ding


>>> .
>>>
>>
> 
> .
> 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2017-05-27 10:34 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-01 23:13 [PATCH 0/2] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Casey Leedom
2017-05-01 23:13 ` Casey Leedom
2017-05-01 23:13 ` [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING Casey Leedom
2017-05-01 23:13   ` Casey Leedom
2017-05-02  6:49   ` Ding Tianhong
2017-05-02  6:49     ` Ding Tianhong
2017-05-02 16:39   ` Alexander Duyck
2017-05-02 16:39     ` Alexander Duyck
2017-05-02 16:53     ` Raj, Ashok
2017-05-02 16:53       ` Raj, Ashok
2017-05-02 18:10       ` Alexander Duyck
2017-05-02 18:10         ` Alexander Duyck
2017-05-02 19:34         ` Raj, Ashok
2017-05-02 22:41           ` Alexander Duyck
2017-05-03  4:30         ` Casey Leedom
2017-05-03  4:30           ` Casey Leedom
2017-05-03 16:02           ` Alexander Duyck
2017-05-03 16:02             ` Alexander Duyck
2017-05-04 21:01             ` Casey Leedom
2017-05-04 21:01               ` Casey Leedom
2017-05-05 14:04               ` Alexander Duyck
2017-05-05 14:04                 ` Alexander Duyck
2017-05-06  3:08                 ` Ding Tianhong
2017-05-06  3:08                   ` Ding Tianhong
2017-05-06 18:07                   ` Alexander Duyck
2017-05-06 18:07                     ` Alexander Duyck
2017-05-08 14:33                     ` Ding Tianhong
2017-05-08 14:33                       ` Ding Tianhong
2017-05-08 15:22                       ` Alexander Duyck
2017-05-08 15:22                         ` Alexander Duyck
2017-05-09  0:48                     ` Casey Leedom
2017-05-09  0:48                       ` Casey Leedom
2017-05-11  1:15                       ` Ding Tianhong
2017-05-11  1:15                         ` Ding Tianhong
2017-05-16 18:38                         ` Casey Leedom
2017-05-16 18:38                           ` Casey Leedom
2017-05-17 14:49                           ` Alexander Duyck
2017-05-17 14:49                             ` Alexander Duyck
2017-05-25 13:35                       ` Ding Tianhong
2017-05-25 13:35                         ` Ding Tianhong
2017-05-25 19:49                         ` Alexander Duyck
2017-05-25 19:49                           ` Alexander Duyck
2017-05-27 10:34                           ` Ding Tianhong
2017-05-27 10:34                             ` Ding Tianhong
2017-05-02 16:44   ` Raj, Ashok
2017-05-02 16:44     ` Raj, Ashok
2017-05-01 23:13 ` [PATCH 2/2] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Casey Leedom
2017-05-01 23:13   ` Casey Leedom
2017-05-02  0:56 ` [PATCH 0/2] Add " Ding Tianhong
2017-05-02  0:56   ` Ding Tianhong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.