[PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20 ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-kernel, Sinan Kaya, intel-wired-lan,
	Jeff Kirsher, linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 2 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e554aa6cf..7028516 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 357d605..2d323fc 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20 ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Jeff Kirsher,
	intel-wired-lan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 2 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e554aa6cf..7028516 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 357d605..2d323fc 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20 ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 2 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e554aa6cf..7028516 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 357d605..2d323fc 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20 ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 2 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e554aa6cf..7028516 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 357d605..2d323fc 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20 ` Sinan Kaya
  (?)
@ 2018-03-14  3:20   ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Jeff Kirsher,
	intel-wired-lan, linux-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0da5aa2..35ca1d8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1692,7 +1692,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -2453,7 +2453,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(ring->next_to_use, ring->tail);
+		writel_relaxed(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
 	}
@@ -10014,7 +10014,7 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	 * are new descriptors to fetch.
 	 */
 	wmb();
-	writel(ring->next_to_use, ring->tail);
+	writel_relaxed(ring->next_to_use, ring->tail);
 
 	return;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0da5aa2..35ca1d8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1692,7 +1692,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -2453,7 +2453,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(ring->next_to_use, ring->tail);
+		writel_relaxed(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
 	}
@@ -10014,7 +10014,7 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	 * are new descriptors to fetch.
 	 */
 	wmb();
-	writel(ring->next_to_use, ring->tail);
+	writel_relaxed(ring->next_to_use, ring->tail);
 
 	return;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0da5aa2..35ca1d8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1692,7 +1692,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -2453,7 +2453,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(ring->next_to_use, ring->tail);
+		writel_relaxed(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
 	}
@@ -10014,7 +10014,7 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	 * are new descriptors to fetch.
 	 */
 	wmb();
-	writel(ring->next_to_use, ring->tail);
+	writel_relaxed(ring->next_to_use, ring->tail);
 
 	return;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 3/7] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20 ` Sinan Kaya
@ 2018-03-14  3:20   ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Michal Kalderon,
	Ariel Elior, Doug Ledford, Jason Gunthorpe, linux-rdma,
	linux-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/qedr/verbs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 53f00db..ccd55f4 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1870,7 +1870,7 @@ static int qedr_update_qp_state(struct qedr_dev *dev,
 
 			if (rdma_protocol_roce(&dev->ibdev, 1)) {
 				wmb();
-				writel(qp->rq.db_data.raw, qp->rq.db);
+				writel_relaxed(qp->rq.db_data.raw, qp->rq.db);
 				/* Make sure write takes effect */
 				mmiowb();
 			}
@@ -3247,7 +3247,7 @@ int qedr_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 	 * redundant doorbell.
 	 */
 	wmb();
-	writel(qp->sq.db_data.raw, qp->sq.db);
+	writel_relaxed(qp->sq.db_data.raw, qp->sq.db);
 
 	/* Make sure write sticks */
 	mmiowb();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 3/7] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/qedr/verbs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 53f00db..ccd55f4 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1870,7 +1870,7 @@ static int qedr_update_qp_state(struct qedr_dev *dev,
 
 			if (rdma_protocol_roce(&dev->ibdev, 1)) {
 				wmb();
-				writel(qp->rq.db_data.raw, qp->rq.db);
+				writel_relaxed(qp->rq.db_data.raw, qp->rq.db);
 				/* Make sure write takes effect */
 				mmiowb();
 			}
@@ -3247,7 +3247,7 @@ int qedr_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 	 * redundant doorbell.
 	 */
 	wmb();
-	writel(qp->sq.db_data.raw, qp->sq.db);
+	writel_relaxed(qp->sq.db_data.raw, qp->sq.db);
 
 	/* Make sure write sticks */
 	mmiowb();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 4/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20 ` Sinan Kaya
  (?)
@ 2018-03-14  3:20   ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Jeff Kirsher,
	intel-wired-lan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index 4214c15..fe3441b 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -251,7 +251,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		 * such as IA-64).
 		*/
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->tail);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 4/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index 4214c15..fe3441b 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -251,7 +251,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		 * such as IA-64).
 		*/
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->tail);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 4/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index 4214c15..fe3441b 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -251,7 +251,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		 * such as IA-64).
 		*/
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->tail);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 5/7] igb: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20 ` Sinan Kaya
  (?)
@ 2018-03-14  3:20   ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Jeff Kirsher,
	intel-wired-lan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index b88fae7..ba8ccb5 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8072,7 +8072,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 5/7] igb: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index b88fae7..ba8ccb5 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8072,7 +8072,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 5/7] igb: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index b88fae7..ba8ccb5 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8072,7 +8072,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20 ` Sinan Kaya
  (?)
@ 2018-03-14  3:20   ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Jeff Kirsher,
	intel-wired-lan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 3dd4aeb..e0e583a 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -4573,7 +4573,7 @@ e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->rdt);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->rdt);
 	}
 }
 
@@ -4688,7 +4688,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, hw->hw_addr + rx_ring->rdt);
+		writel_relaxed(i, hw->hw_addr + rx_ring->rdt);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 3dd4aeb..e0e583a 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -4573,7 +4573,7 @@ e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->rdt);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->rdt);
 	}
 }
 
@@ -4688,7 +4688,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, hw->hw_addr + rx_ring->rdt);
+		writel_relaxed(i, hw->hw_addr + rx_ring->rdt);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 3dd4aeb..e0e583a 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -4573,7 +4573,7 @@ e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->rdt);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->rdt);
 	}
 }
 
@@ -4688,7 +4688,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, hw->hw_addr + rx_ring->rdt);
+		writel_relaxed(i, hw->hw_addr + rx_ring->rdt);
 	}
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20 ` Sinan Kaya
  (?)
@ 2018-03-14  3:20   ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Jeff Kirsher,
	intel-wired-lan, linux-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 ++++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index f695242..64d0e0b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -244,9 +244,12 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
+/* Assumes caller has executed a write barrier to order memory and device
+ * requests.
+ */
 static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
 {
-	writel(value, ring->tail);
+	writel_relaxed(value, ring->tail);
 }
 
 #define IXGBEVF_RX_DESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 9b3d43d..0ba7f59 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3643,6 +3643,13 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 
 	tx_ring->next_to_use = i;
 
+	/* Force memory writes to complete before letting h/w
+	 * know there are new descriptors to fetch.  (Only
+	 * applicable for weak-ordered memory model archs,
+	 * such as IA-64).
+	 */
+	wmb();
+
 	/* notify HW of packet */
 	ixgbevf_write_tail(tx_ring, i);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 ++++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index f695242..64d0e0b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -244,9 +244,12 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
+/* Assumes caller has executed a write barrier to order memory and device
+ * requests.
+ */
 static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
 {
-	writel(value, ring->tail);
+	writel_relaxed(value, ring->tail);
 }
 
 #define IXGBEVF_RX_DESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 9b3d43d..0ba7f59 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3643,6 +3643,13 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 
 	tx_ring->next_to_use = i;
 
+	/* Force memory writes to complete before letting h/w
+	 * know there are new descriptors to fetch.  (Only
+	 * applicable for weak-ordered memory model archs,
+	 * such as IA-64).
+	 */
+	wmb();
+
 	/* notify HW of packet */
 	ixgbevf_write_tail(tx_ring, i);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:20   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:20 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 ++++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index f695242..64d0e0b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -244,9 +244,12 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
+/* Assumes caller has executed a write barrier to order memory and device
+ * requests.
+ */
 static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
 {
-	writel(value, ring->tail);
+	writel_relaxed(value, ring->tail);
 }
 
 #define IXGBEVF_RX_DESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 9b3d43d..0ba7f59 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3643,6 +3643,13 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 
 	tx_ring->next_to_use = i;
 
+	/* Force memory writes to complete before letting h/w
+	 * know there are new descriptors to fetch.  (Only
+	 * applicable for weak-ordered memory model archs,
+	 * such as IA-64).
+	 */
+	wmb();
+
 	/* notify HW of packet */
 	ixgbevf_write_tail(tx_ring, i);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [PATCH 3/7] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20   ` Sinan Kaya
@ 2018-03-14  4:12     ` Jason Gunthorpe
  -1 siblings, 0 replies; 85+ messages in thread
From: Jason Gunthorpe @ 2018-03-14  4:12 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Michal Kalderon, Ariel Elior, Doug Ledford, linux-rdma,
	linux-kernel

On Tue, Mar 13, 2018 at 11:20:24PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Sure matches my understanding of writel_relaxed

This is part of a series, should we take just this patch through the
rdma tree? If not:

Acked-by: Jason Gunthorpe <jgg@mellanox.com>

Thanks,
Jason

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 3/7] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  4:12     ` Jason Gunthorpe
  0 siblings, 0 replies; 85+ messages in thread
From: Jason Gunthorpe @ 2018-03-14  4:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 13, 2018 at 11:20:24PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Sure matches my understanding of writel_relaxed

This is part of a series, should we take just this patch through the
rdma tree? If not:

Acked-by: Jason Gunthorpe <jgg@mellanox.com>

Thanks,
Jason

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20   ` Sinan Kaya
  (?)
@ 2018-03-14  5:08     ` Timur Tabi
  -1 siblings, 0 replies; 85+ messages in thread
From: Timur Tabi @ 2018-03-14  5:08 UTC (permalink / raw)
  To: Sinan Kaya, netdev, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Jeff Kirsher, intel-wired-lan,
	linux-kernel

On 3/13/18 10:20 PM, Sinan Kaya wrote:
> +/* Assumes caller has executed a write barrier to order memory and device
> + * requests.
> + */
>   static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
>   {
> -	writel(value, ring->tail);
> +	writel_relaxed(value, ring->tail);
>   }

Why not put the wmb() in this function, or just get rid of the wmb() in 
the rest of the file and keep this as writel?  That way, you can avoid 
the comment and the risk that comes with it.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  5:08     ` Timur Tabi
  0 siblings, 0 replies; 85+ messages in thread
From: Timur Tabi @ 2018-03-14  5:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/13/18 10:20 PM, Sinan Kaya wrote:
> +/* Assumes caller has executed a write barrier to order memory and device
> + * requests.
> + */
>   static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
>   {
> -	writel(value, ring->tail);
> +	writel_relaxed(value, ring->tail);
>   }

Why not put the wmb() in this function, or just get rid of the wmb() in 
the rest of the file and keep this as writel?  That way, you can avoid 
the comment and the risk that comes with it.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  5:08     ` Timur Tabi
  0 siblings, 0 replies; 85+ messages in thread
From: Timur Tabi @ 2018-03-14  5:08 UTC (permalink / raw)
  To: intel-wired-lan

On 3/13/18 10:20 PM, Sinan Kaya wrote:
> +/* Assumes caller has executed a write barrier to order memory and device
> + * requests.
> + */
>   static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
>   {
> -	writel(value, ring->tail);
> +	writel_relaxed(value, ring->tail);
>   }

Why not put the wmb() in this function, or just get rid of the wmb() in 
the rest of the file and keep this as writel?  That way, you can avoid 
the comment and the risk that comes with it.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 3/7] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  4:12     ` Jason Gunthorpe
@ 2018-03-14 12:06       ` okaya at codeaurora.org
  -1 siblings, 0 replies; 85+ messages in thread
From: okaya @ 2018-03-14 12:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Michal Kalderon, Ariel Elior, Doug Ledford, linux-rdma,
	linux-kernel

On 2018-03-14 00:12, Jason Gunthorpe wrote:
> On Tue, Mar 13, 2018 at 11:20:24PM -0400, Sinan Kaya wrote:
>> Code includes wmb() followed by writel() in multiple places. writel()
>> already has a barrier on some architectures like arm64.
>> 
>> This ends up CPU observing two barriers back to back before executing 
>> the
>> register write.
>> 
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>> 
>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Sure matches my understanding of writel_relaxed
> 
> This is part of a series, should we take just this patch through the
> rdma tree? If not:
> 
> Acked-by: Jason Gunthorpe <jgg@mellanox.com>

Feel free to take pieces.


> 
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 3/7] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14 12:06       ` okaya at codeaurora.org
  0 siblings, 0 replies; 85+ messages in thread
From: okaya at codeaurora.org @ 2018-03-14 12:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 2018-03-14 00:12, Jason Gunthorpe wrote:
> On Tue, Mar 13, 2018 at 11:20:24PM -0400, Sinan Kaya wrote:
>> Code includes wmb() followed by writel() in multiple places. writel()
>> already has a barrier on some architectures like arm64.
>> 
>> This ends up CPU observing two barriers back to back before executing 
>> the
>> register write.
>> 
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>> 
>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Sure matches my understanding of writel_relaxed
> 
> This is part of a series, should we take just this patch through the
> rdma tree? If not:
> 
> Acked-by: Jason Gunthorpe <jgg@mellanox.com>

Feel free to take pieces.


> 
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  5:08     ` Timur Tabi
  (?)
@ 2018-03-14 12:13       ` okaya at codeaurora.org
  -1 siblings, 0 replies; 85+ messages in thread
From: okaya @ 2018-03-14 12:13 UTC (permalink / raw)
  To: Timur Tabi
  Cc: netdev, sulrich, linux-arm-msm, linux-arm-kernel, Jeff Kirsher,
	intel-wired-lan, linux-kernel

On 2018-03-14 01:08, Timur Tabi wrote:
> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>> +/* Assumes caller has executed a write barrier to order memory and 
>> device
>> + * requests.
>> + */
>>   static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 
>> value)
>>   {
>> -	writel(value, ring->tail);
>> +	writel_relaxed(value, ring->tail);
>>   }
> 
> Why not put the wmb() in this function, or just get rid of the wmb()
> in the rest of the file and keep this as writel?  That way, you can
> avoid the comment and the risk that comes with it.


Sure, both solutions will work. I want to see what the maintainer 
prefers. I can repost accordingly.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14 12:13       ` okaya at codeaurora.org
  0 siblings, 0 replies; 85+ messages in thread
From: okaya at codeaurora.org @ 2018-03-14 12:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 2018-03-14 01:08, Timur Tabi wrote:
> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>> +/* Assumes caller has executed a write barrier to order memory and 
>> device
>> + * requests.
>> + */
>>   static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 
>> value)
>>   {
>> -	writel(value, ring->tail);
>> +	writel_relaxed(value, ring->tail);
>>   }
> 
> Why not put the wmb() in this function, or just get rid of the wmb()
> in the rest of the file and keep this as writel?  That way, you can
> avoid the comment and the risk that comes with it.


Sure, both solutions will work. I want to see what the maintainer 
prefers. I can repost accordingly.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14 12:13       ` okaya at codeaurora.org
  0 siblings, 0 replies; 85+ messages in thread
From: okaya @ 2018-03-14 12:13 UTC (permalink / raw)
  To: intel-wired-lan

On 2018-03-14 01:08, Timur Tabi wrote:
> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>> +/* Assumes caller has executed a write barrier to order memory and 
>> device
>> + * requests.
>> + */
>>   static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 
>> value)
>>   {
>> -	writel(value, ring->tail);
>> +	writel_relaxed(value, ring->tail);
>>   }
> 
> Why not put the wmb() in this function, or just get rid of the wmb()
> in the rest of the file and keep this as writel?  That way, you can
> avoid the comment and the risk that comes with it.


Sure, both solutions will work. I want to see what the maintainer 
prefers. I can repost accordingly.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14 12:13       ` okaya at codeaurora.org
  (?)
@ 2018-03-14 21:49         ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-14 21:49 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
> On 2018-03-14 01:08, Timur Tabi wrote:
>>
>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>>
>>> +/* Assumes caller has executed a write barrier to order memory and
>>> device
>>> + * requests.
>>> + */
>>>   static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32
>>> value)
>>>   {
>>> -       writel(value, ring->tail);
>>> +       writel_relaxed(value, ring->tail);
>>>   }
>>
>>
>> Why not put the wmb() in this function, or just get rid of the wmb()
>> in the rest of the file and keep this as writel?  That way, you can
>> avoid the comment and the risk that comes with it.
>
>
>
> Sure, both solutions will work. I want to see what the maintainer prefers. I
> can repost accordingly.

Actually I would argue this whole patch set is pointless. For starters
why is it we are only updating the Intel Ethernet drivers here? This
seems like something that is going to impact the whole kernel tree
since many of us have been writing drivers for some time assuming x86
style behavior.

It doesn't make sense to be optimizing the drivers for one subset of
architectures. The scope of the work needed to update the drivers for
this would be ridiculous. Also I don't see how this could be expected
to work on any other architecture when we pretty much need to have a
wmb() before calling the writel on x86 to deal with accesses between
coherent and non-coherent memory. It seems to me more like somebody
added what they considered to be an optimization somewhere that is a
workaround for a poorly written driver. Either that or the barrier is
serving a different purpose then the one we were using.

It would make more sense to put in the effort making writel and
writel_relaxed consistent between architectures before we go through
and start modifying driver code to support different architectures.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14 21:49         ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-14 21:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
> On 2018-03-14 01:08, Timur Tabi wrote:
>>
>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>>
>>> +/* Assumes caller has executed a write barrier to order memory and
>>> device
>>> + * requests.
>>> + */
>>>   static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32
>>> value)
>>>   {
>>> -       writel(value, ring->tail);
>>> +       writel_relaxed(value, ring->tail);
>>>   }
>>
>>
>> Why not put the wmb() in this function, or just get rid of the wmb()
>> in the rest of the file and keep this as writel?  That way, you can
>> avoid the comment and the risk that comes with it.
>
>
>
> Sure, both solutions will work. I want to see what the maintainer prefers. I
> can repost accordingly.

Actually I would argue this whole patch set is pointless. For starters
why is it we are only updating the Intel Ethernet drivers here? This
seems like something that is going to impact the whole kernel tree
since many of us have been writing drivers for some time assuming x86
style behavior.

It doesn't make sense to be optimizing the drivers for one subset of
architectures. The scope of the work needed to update the drivers for
this would be ridiculous. Also I don't see how this could be expected
to work on any other architecture when we pretty much need to have a
wmb() before calling the writel on x86 to deal with accesses between
coherent and non-coherent memory. It seems to me more like somebody
added what they considered to be an optimization somewhere that is a
workaround for a poorly written driver. Either that or the barrier is
serving a different purpose then the one we were using.

It would make more sense to put in the effort making writel and
writel_relaxed consistent between architectures before we go through
and start modifying driver code to support different architectures.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14 21:49         ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-14 21:49 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
> On 2018-03-14 01:08, Timur Tabi wrote:
>>
>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>>
>>> +/* Assumes caller has executed a write barrier to order memory and
>>> device
>>> + * requests.
>>> + */
>>>   static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32
>>> value)
>>>   {
>>> -       writel(value, ring->tail);
>>> +       writel_relaxed(value, ring->tail);
>>>   }
>>
>>
>> Why not put the wmb() in this function, or just get rid of the wmb()
>> in the rest of the file and keep this as writel?  That way, you can
>> avoid the comment and the risk that comes with it.
>
>
>
> Sure, both solutions will work. I want to see what the maintainer prefers. I
> can repost accordingly.

Actually I would argue this whole patch set is pointless. For starters
why is it we are only updating the Intel Ethernet drivers here? This
seems like something that is going to impact the whole kernel tree
since many of us have been writing drivers for some time assuming x86
style behavior.

It doesn't make sense to be optimizing the drivers for one subset of
architectures. The scope of the work needed to update the drivers for
this would be ridiculous. Also I don't see how this could be expected
to work on any other architecture when we pretty much need to have a
wmb() before calling the writel on x86 to deal with accesses between
coherent and non-coherent memory. It seems to me more like somebody
added what they considered to be an optimization somewhere that is a
workaround for a poorly written driver. Either that or the barrier is
serving a different purpose then the one we were using.

It would make more sense to put in the effort making writel and
writel_relaxed consistent between architectures before we go through
and start modifying driver code to support different architectures.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14 21:49         ` Alexander Duyck
  (?)
@ 2018-03-14 22:57           ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14 22:57 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

Hi Alexander,

On 3/14/2018 5:49 PM, Alexander Duyck wrote:
> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>
>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:

> Actually I would argue this whole patch set is pointless. For starters
> why is it we are only updating the Intel Ethernet drivers here? 

I did a regex search for wmb() followed by writel() in each drivers directory.
I scrubbed the ones I care about and posted this series. Note also that
I have one Infiniband patch in the series. 

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering. 

> This
> seems like something that is going to impact the whole kernel tree
> since many of us have been writing drivers for some time assuming x86
> style behavior.

That's true. We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers. 

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

> 
> It doesn't make sense to be optimizing the drivers for one subset of
> architectures. The scope of the work needed to update the drivers for
> this would be ridiculous. Also I don't see how this could be expected
> to work on any other architecture when we pretty much need to have a
> wmb() before calling the writel on x86 to deal with accesses between
> coherent and non-coherent memory. It seems to me more like somebody
> added what they considered to be an optimization somewhere that is a
> workaround for a poorly written driver. Either that or the barrier is
> serving a different purpose then the one we were using.

Is there a semantic problem with the definition of wmb() vs. writel() vs.
writel_relaxed()? I thought everything is well described in barriers
document about what to expect from these APIs.

AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
It doesn't really change anything for x86 but it saves barriers on
other architectures.

> 
> It would make more sense to put in the effort making writel and
> writel_relaxed consistent between architectures before we go through
> and start modifying driver code to support different architectures.
> 

Is there an arch problem that I'm not seeing?

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14 22:57           ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14 22:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alexander,

On 3/14/2018 5:49 PM, Alexander Duyck wrote:
> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>
>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:

> Actually I would argue this whole patch set is pointless. For starters
> why is it we are only updating the Intel Ethernet drivers here? 

I did a regex search for wmb() followed by writel() in each drivers directory.
I scrubbed the ones I care about and posted this series. Note also that
I have one Infiniband patch in the series. 

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering. 

> This
> seems like something that is going to impact the whole kernel tree
> since many of us have been writing drivers for some time assuming x86
> style behavior.

That's true. We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers. 

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

> 
> It doesn't make sense to be optimizing the drivers for one subset of
> architectures. The scope of the work needed to update the drivers for
> this would be ridiculous. Also I don't see how this could be expected
> to work on any other architecture when we pretty much need to have a
> wmb() before calling the writel on x86 to deal with accesses between
> coherent and non-coherent memory. It seems to me more like somebody
> added what they considered to be an optimization somewhere that is a
> workaround for a poorly written driver. Either that or the barrier is
> serving a different purpose then the one we were using.

Is there a semantic problem with the definition of wmb() vs. writel() vs.
writel_relaxed()? I thought everything is well described in barriers
document about what to expect from these APIs.

AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
It doesn't really change anything for x86 but it saves barriers on
other architectures.

> 
> It would make more sense to put in the effort making writel and
> writel_relaxed consistent between architectures before we go through
> and start modifying driver code to support different architectures.
> 

Is there an arch problem that I'm not seeing?

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14 22:57           ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14 22:57 UTC (permalink / raw)
  To: intel-wired-lan

Hi Alexander,

On 3/14/2018 5:49 PM, Alexander Duyck wrote:
> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>
>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:

> Actually I would argue this whole patch set is pointless. For starters
> why is it we are only updating the Intel Ethernet drivers here? 

I did a regex search for wmb() followed by writel() in each drivers directory.
I scrubbed the ones I care about and posted this series. Note also that
I have one Infiniband patch in the series. 

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering. 

> This
> seems like something that is going to impact the whole kernel tree
> since many of us have been writing drivers for some time assuming x86
> style behavior.

That's true. We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers. 

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

> 
> It doesn't make sense to be optimizing the drivers for one subset of
> architectures. The scope of the work needed to update the drivers for
> this would be ridiculous. Also I don't see how this could be expected
> to work on any other architecture when we pretty much need to have a
> wmb() before calling the writel on x86 to deal with accesses between
> coherent and non-coherent memory. It seems to me more like somebody
> added what they considered to be an optimization somewhere that is a
> workaround for a poorly written driver. Either that or the barrier is
> serving a different purpose then the one we were using.

Is there a semantic problem with the definition of wmb() vs. writel() vs.
writel_relaxed()? I thought everything is well described in barriers
document about what to expect from these APIs.

AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
It doesn't really change anything for x86 but it saves barriers on
other architectures.

> 
> It would make more sense to put in the effort making writel and
> writel_relaxed consistent between architectures before we go through
> and start modifying driver code to support different architectures.
> 

Is there an arch problem that I'm not seeing?

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20   ` Sinan Kaya
  (?)
@ 2018-03-15  1:41     ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:41 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 3dd4aeb..e0e583a 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -4573,7 +4573,7 @@ e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
>                  * such as IA-64).
>                  */
>                 wmb();
> -               writel(i, adapter->hw.hw_addr + rx_ring->rdt);
> +               writel_relaxed(i, adapter->hw.hw_addr + rx_ring->rdt);
>         }
>  }
>
> @@ -4688,7 +4688,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
>                  * such as IA-64).
>                  */
>                 wmb();
> -               writel(i, hw->hw_addr + rx_ring->rdt);
> +               writel_relaxed(i, hw->hw_addr + rx_ring->rdt);
>         }
>  }
>

So you missed the writel in e1000_xmit_frame. You should probably get
that one too while you are doing these updates. The wmb() is in
e1000_tx_queue().

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:41     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 3dd4aeb..e0e583a 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -4573,7 +4573,7 @@ e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
>                  * such as IA-64).
>                  */
>                 wmb();
> -               writel(i, adapter->hw.hw_addr + rx_ring->rdt);
> +               writel_relaxed(i, adapter->hw.hw_addr + rx_ring->rdt);
>         }
>  }
>
> @@ -4688,7 +4688,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
>                  * such as IA-64).
>                  */
>                 wmb();
> -               writel(i, hw->hw_addr + rx_ring->rdt);
> +               writel_relaxed(i, hw->hw_addr + rx_ring->rdt);
>         }
>  }
>

So you missed the writel in e1000_xmit_frame. You should probably get
that one too while you are doing these updates. The wmb() is in
e1000_tx_queue().

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:41     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:41 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index 3dd4aeb..e0e583a 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -4573,7 +4573,7 @@ e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
>                  * such as IA-64).
>                  */
>                 wmb();
> -               writel(i, adapter->hw.hw_addr + rx_ring->rdt);
> +               writel_relaxed(i, adapter->hw.hw_addr + rx_ring->rdt);
>         }
>  }
>
> @@ -4688,7 +4688,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
>                  * such as IA-64).
>                  */
>                 wmb();
> -               writel(i, hw->hw_addr + rx_ring->rdt);
> +               writel_relaxed(i, hw->hw_addr + rx_ring->rdt);
>         }
>  }
>

So you missed the writel in e1000_xmit_frame. You should probably get
that one too while you are doing these updates. The wmb() is in
e1000_tx_queue().

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14 22:57           ` Sinan Kaya
  (?)
@ 2018-03-15  1:44             ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:44 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Hi Alexander,
>
> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>
>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>
>> Actually I would argue this whole patch set is pointless. For starters
>> why is it we are only updating the Intel Ethernet drivers here?
>
> I did a regex search for wmb() followed by writel() in each drivers directory.
> I scrubbed the ones I care about and posted this series. Note also that
> I have one Infiniband patch in the series.

I didn't see it as it I was only looking at the patches that had ended
up in intel-wired-lan. Also was there a cover page, I couldn't seem to
find that on LKML.

> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.

It might be advisable to break things up to subsystem or family. So
for example if you are going to update the Intel Ethernet drivers I
would focus on that and maybe spin the infiniband patch of into a
separate set that can be applied to a separate tree. This is something
I would consider more of a driver optimization than a fix. In our case
it makes it easier for us to maintain the patches to the Intel drivers
if you could submit just those to Jeff and Intel-wired-lan so that we
can take care of test and review as well as figure out what other
drivers will would still need to update in order to handle all the
cases involved in this.

>> This
>> seems like something that is going to impact the whole kernel tree
>> since many of us have been writing drivers for some time assuming x86
>> style behavior.
>
> That's true. We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
>
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.

My only real objection is that you are going to be having to scrub
pretty much ALL the drivers. It seems a little like trying to fix a
bad tire on your car by paving the road to match the shape of the
tire.

>>
>> It doesn't make sense to be optimizing the drivers for one subset of
>> architectures. The scope of the work needed to update the drivers for
>> this would be ridiculous. Also I don't see how this could be expected
>> to work on any other architecture when we pretty much need to have a
>> wmb() before calling the writel on x86 to deal with accesses between
>> coherent and non-coherent memory. It seems to me more like somebody
>> added what they considered to be an optimization somewhere that is a
>> workaround for a poorly written driver. Either that or the barrier is
>> serving a different purpose then the one we were using.
>
> Is there a semantic problem with the definition of wmb() vs. writel() vs.
> writel_relaxed()? I thought everything is well described in barriers
> document about what to expect from these APIs.
>
> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
> It doesn't really change anything for x86 but it saves barriers on
> other architectures.

Yeah. I had to go through and do some review since my concerns have
been PowerPC, IA64, and x86 historically. From what I can tell all
those architectures are setup the same way so that shouldn't be an
issue.

>>
>> It would make more sense to put in the effort making writel and
>> writel_relaxed consistent between architectures before we go through
>> and start modifying driver code to support different architectures.
>>
>
> Is there an arch problem that I'm not seeing?
>
> Sinan

It isn't really an arch problem I have as a logistical one. It just
seems like this is really backwards in terms of how this has been
handled. For the x86 we have historically had to deal with the
barriers for this kind of stuff ourselves, now for ARM and a couple
other architectures they seem to have incorporated the barriers into
writel and are expecting everyone to move over to writel_relaxed. It
seems like instead of going that route they should have probably just
looked at pushing the ARM drivers to something like a "writel_strict"
and adopted the behavior of the other architectures for writel.

I'll go back through and review. It looks like a number of items were missed.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:44             ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Hi Alexander,
>
> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>
>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>
>> Actually I would argue this whole patch set is pointless. For starters
>> why is it we are only updating the Intel Ethernet drivers here?
>
> I did a regex search for wmb() followed by writel() in each drivers directory.
> I scrubbed the ones I care about and posted this series. Note also that
> I have one Infiniband patch in the series.

I didn't see it as it I was only looking at the patches that had ended
up in intel-wired-lan. Also was there a cover page, I couldn't seem to
find that on LKML.

> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.

It might be advisable to break things up to subsystem or family. So
for example if you are going to update the Intel Ethernet drivers I
would focus on that and maybe spin the infiniband patch of into a
separate set that can be applied to a separate tree. This is something
I would consider more of a driver optimization than a fix. In our case
it makes it easier for us to maintain the patches to the Intel drivers
if you could submit just those to Jeff and Intel-wired-lan so that we
can take care of test and review as well as figure out what other
drivers will would still need to update in order to handle all the
cases involved in this.

>> This
>> seems like something that is going to impact the whole kernel tree
>> since many of us have been writing drivers for some time assuming x86
>> style behavior.
>
> That's true. We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
>
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.

My only real objection is that you are going to be having to scrub
pretty much ALL the drivers. It seems a little like trying to fix a
bad tire on your car by paving the road to match the shape of the
tire.

>>
>> It doesn't make sense to be optimizing the drivers for one subset of
>> architectures. The scope of the work needed to update the drivers for
>> this would be ridiculous. Also I don't see how this could be expected
>> to work on any other architecture when we pretty much need to have a
>> wmb() before calling the writel on x86 to deal with accesses between
>> coherent and non-coherent memory. It seems to me more like somebody
>> added what they considered to be an optimization somewhere that is a
>> workaround for a poorly written driver. Either that or the barrier is
>> serving a different purpose then the one we were using.
>
> Is there a semantic problem with the definition of wmb() vs. writel() vs.
> writel_relaxed()? I thought everything is well described in barriers
> document about what to expect from these APIs.
>
> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
> It doesn't really change anything for x86 but it saves barriers on
> other architectures.

Yeah. I had to go through and do some review since my concerns have
been PowerPC, IA64, and x86 historically. From what I can tell all
those architectures are setup the same way so that shouldn't be an
issue.

>>
>> It would make more sense to put in the effort making writel and
>> writel_relaxed consistent between architectures before we go through
>> and start modifying driver code to support different architectures.
>>
>
> Is there an arch problem that I'm not seeing?
>
> Sinan

It isn't really an arch problem I have as a logistical one. It just
seems like this is really backwards in terms of how this has been
handled. For the x86 we have historically had to deal with the
barriers for this kind of stuff ourselves, now for ARM and a couple
other architectures they seem to have incorporated the barriers into
writel and are expecting everyone to move over to writel_relaxed. It
seems like instead of going that route they should have probably just
looked at pushing the ARM drivers to something like a "writel_strict"
and adopted the behavior of the other architectures for writel.

I'll go back through and review. It looks like a number of items were missed.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:44             ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:44 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Hi Alexander,
>
> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>
>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>
>> Actually I would argue this whole patch set is pointless. For starters
>> why is it we are only updating the Intel Ethernet drivers here?
>
> I did a regex search for wmb() followed by writel() in each drivers directory.
> I scrubbed the ones I care about and posted this series. Note also that
> I have one Infiniband patch in the series.

I didn't see it as it I was only looking at the patches that had ended
up in intel-wired-lan. Also was there a cover page, I couldn't seem to
find that on LKML.

> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.

It might be advisable to break things up to subsystem or family. So
for example if you are going to update the Intel Ethernet drivers I
would focus on that and maybe spin the infiniband patch of into a
separate set that can be applied to a separate tree. This is something
I would consider more of a driver optimization than a fix. In our case
it makes it easier for us to maintain the patches to the Intel drivers
if you could submit just those to Jeff and Intel-wired-lan so that we
can take care of test and review as well as figure out what other
drivers will would still need to update in order to handle all the
cases involved in this.

>> This
>> seems like something that is going to impact the whole kernel tree
>> since many of us have been writing drivers for some time assuming x86
>> style behavior.
>
> That's true. We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
>
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.

My only real objection is that you are going to be having to scrub
pretty much ALL the drivers. It seems a little like trying to fix a
bad tire on your car by paving the road to match the shape of the
tire.

>>
>> It doesn't make sense to be optimizing the drivers for one subset of
>> architectures. The scope of the work needed to update the drivers for
>> this would be ridiculous. Also I don't see how this could be expected
>> to work on any other architecture when we pretty much need to have a
>> wmb() before calling the writel on x86 to deal with accesses between
>> coherent and non-coherent memory. It seems to me more like somebody
>> added what they considered to be an optimization somewhere that is a
>> workaround for a poorly written driver. Either that or the barrier is
>> serving a different purpose then the one we were using.
>
> Is there a semantic problem with the definition of wmb() vs. writel() vs.
> writel_relaxed()? I thought everything is well described in barriers
> document about what to expect from these APIs.
>
> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
> It doesn't really change anything for x86 but it saves barriers on
> other architectures.

Yeah. I had to go through and do some review since my concerns have
been PowerPC, IA64, and x86 historically. From what I can tell all
those architectures are setup the same way so that shouldn't be an
issue.

>>
>> It would make more sense to put in the effort making writel and
>> writel_relaxed consistent between architectures before we go through
>> and start modifying driver code to support different architectures.
>>
>
> Is there an arch problem that I'm not seeing?
>
> Sinan

It isn't really an arch problem I have as a logistical one. It just
seems like this is really backwards in terms of how this has been
handled. For the x86 we have historically had to deal with the
barriers for this kind of stuff ourselves, now for ARM and a couple
other architectures they seem to have incorporated the barriers into
writel and are expecting everyone to move over to writel_relaxed. It
seems like instead of going that route they should have probably just
looked at pushing the ARM drivers to something like a "writel_strict"
and adopted the behavior of the other architectures for writel.

I'll go back through and review. It looks like a number of items were missed.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20   ` Sinan Kaya
  (?)
@ 2018-03-15  1:45     ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:45 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 ++++-
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> index f695242..64d0e0b 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> @@ -244,9 +244,12 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
>         return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
>  }
>
> +/* Assumes caller has executed a write barrier to order memory and device
> + * requests.
> + */
>  static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
>  {
> -       writel(value, ring->tail);
> +       writel_relaxed(value, ring->tail);
>  }
>
>  #define IXGBEVF_RX_DESC(R, i)  \
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index 9b3d43d..0ba7f59 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -3643,6 +3643,13 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
>
>         tx_ring->next_to_use = i;
>
> +       /* Force memory writes to complete before letting h/w
> +        * know there are new descriptors to fetch.  (Only
> +        * applicable for weak-ordered memory model archs,
> +        * such as IA-64).
> +        */
> +       wmb();

This memory barrier is redundant. There is a wmb() that is called
about 10 lines before this.

> +
>         /* notify HW of packet */
>         ixgbevf_write_tail(tx_ring, i);
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:45     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 ++++-
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> index f695242..64d0e0b 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> @@ -244,9 +244,12 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
>         return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
>  }
>
> +/* Assumes caller has executed a write barrier to order memory and device
> + * requests.
> + */
>  static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
>  {
> -       writel(value, ring->tail);
> +       writel_relaxed(value, ring->tail);
>  }
>
>  #define IXGBEVF_RX_DESC(R, i)  \
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index 9b3d43d..0ba7f59 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -3643,6 +3643,13 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
>
>         tx_ring->next_to_use = i;
>
> +       /* Force memory writes to complete before letting h/w
> +        * know there are new descriptors to fetch.  (Only
> +        * applicable for weak-ordered memory model archs,
> +        * such as IA-64).
> +        */
> +       wmb();

This memory barrier is redundant. There is a wmb() that is called
about 10 lines before this.

> +
>         /* notify HW of packet */
>         ixgbevf_write_tail(tx_ring, i);
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:45     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:45 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 ++++-
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 +++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> index f695242..64d0e0b 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
> @@ -244,9 +244,12 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
>         return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
>  }
>
> +/* Assumes caller has executed a write barrier to order memory and device
> + * requests.
> + */
>  static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
>  {
> -       writel(value, ring->tail);
> +       writel_relaxed(value, ring->tail);
>  }
>
>  #define IXGBEVF_RX_DESC(R, i)  \
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index 9b3d43d..0ba7f59 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -3643,6 +3643,13 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
>
>         tx_ring->next_to_use = i;
>
> +       /* Force memory writes to complete before letting h/w
> +        * know there are new descriptors to fetch.  (Only
> +        * applicable for weak-ordered memory model archs,
> +        * such as IA-64).
> +        */
> +       wmb();

This memory barrier is redundant. There is a wmb() that is called
about 10 lines before this.

> +
>         /* notify HW of packet */
>         ixgbevf_write_tail(tx_ring, i);
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20   ` Sinan Kaya
  (?)
@ 2018-03-15  1:47     ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:47 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>

In this patch you missed the writel at the end of ixgbe_tx_map.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:47     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>

In this patch you missed the writel at the end of ixgbe_tx_map.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:47     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:47 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>

In this patch you missed the writel at the end of ixgbe_tx_map.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 4/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20   ` Sinan Kaya
  (?)
@ 2018-03-15  1:48     ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:48 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
> index 4214c15..fe3441b 100644
> --- a/drivers/net/ethernet/intel/igbvf/netdev.c
> +++ b/drivers/net/ethernet/intel/igbvf/netdev.c
> @@ -251,7 +251,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
>                  * such as IA-64).
>                 */
>                 wmb();
> -               writel(i, adapter->hw.hw_addr + rx_ring->tail);
> +               writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
>         }
>  }
>

This one missed the writel at the end of igbvf_tx_queue_adv().

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 4/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:48     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
> index 4214c15..fe3441b 100644
> --- a/drivers/net/ethernet/intel/igbvf/netdev.c
> +++ b/drivers/net/ethernet/intel/igbvf/netdev.c
> @@ -251,7 +251,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
>                  * such as IA-64).
>                 */
>                 wmb();
> -               writel(i, adapter->hw.hw_addr + rx_ring->tail);
> +               writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
>         }
>  }
>

This one missed the writel at the end of igbvf_tx_queue_adv().

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 4/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:48     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:48 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
> index 4214c15..fe3441b 100644
> --- a/drivers/net/ethernet/intel/igbvf/netdev.c
> +++ b/drivers/net/ethernet/intel/igbvf/netdev.c
> @@ -251,7 +251,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
>                  * such as IA-64).
>                 */
>                 wmb();
> -               writel(i, adapter->hw.hw_addr + rx_ring->tail);
> +               writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
>         }
>  }
>

This one missed the writel at the end of igbvf_tx_queue_adv().

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 5/7] igb: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20   ` Sinan Kaya
  (?)
@ 2018-03-15  1:50     ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:50 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index b88fae7..ba8ccb5 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8072,7 +8072,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
>                  * such as IA-64).
>                  */
>                 wmb();
> -               writel(i, rx_ring->tail);
> +               writel_relaxed(i, rx_ring->tail);
>         }
>  }
>

This one missed the writel at the end of igb_tx_map().

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 5/7] igb: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:50     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index b88fae7..ba8ccb5 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8072,7 +8072,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
>                  * such as IA-64).
>                  */
>                 wmb();
> -               writel(i, rx_ring->tail);
> +               writel_relaxed(i, rx_ring->tail);
>         }
>  }
>

This one missed the writel at the end of igb_tx_map().

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 5/7] igb: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:50     ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:50 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Mar 13, 2018 at 8:20 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index b88fae7..ba8ccb5 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8072,7 +8072,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
>                  * such as IA-64).
>                  */
>                 wmb();
> -               writel(i, rx_ring->tail);
> +               writel_relaxed(i, rx_ring->tail);
>         }
>  }
>

This one missed the writel at the end of igb_tx_map().

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-15  1:44             ` Alexander Duyck
  (?)
@ 2018-03-15  2:17               ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15  2:17 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On 3/14/2018 9:44 PM, Alexander Duyck wrote:
> On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> Hi Alexander,
>>
>> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>>> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>>
>>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>
>>> Actually I would argue this whole patch set is pointless. For starters
>>> why is it we are only updating the Intel Ethernet drivers here?
>>
>> I did a regex search for wmb() followed by writel() in each drivers directory.
>> I scrubbed the ones I care about and posted this series. Note also that
>> I have one Infiniband patch in the series.
> 
> I didn't see it as it I was only looking at the patches that had ended
> up in intel-wired-lan. Also was there a cover page, I couldn't seem to
> find that on LKML.

Yeah, I didn't have a cover page. These patches were sitting on my branch
for a while. I wanted to get them out without putting too much effort into
it. I'll add it on the next version.

> 
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
> 
> It might be advisable to break things up to subsystem or family. So
> for example if you are going to update the Intel Ethernet drivers I
> would focus on that and maybe spin the infiniband patch of into a
> separate set that can be applied to a separate tree. This is something
> I would consider more of a driver optimization than a fix. In our case
> it makes it easier for us to maintain the patches to the Intel drivers
> if you could submit just those to Jeff and Intel-wired-lan so that we
> can take care of test and review as well as figure out what other
> drivers will would still need to update in order to handle all the
> cases involved in this.
> 
>>> This
>>> seems like something that is going to impact the whole kernel tree
>>> since many of us have been writing drivers for some time assuming x86
>>> style behavior.
>>
>> That's true. We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
> 
> My only real objection is that you are going to be having to scrub
> pretty much ALL the drivers. It seems a little like trying to fix a
> bad tire on your car by paving the road to match the shape of the
> tire.

Or we start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

We could do several iterations like you are suggesting for each subsystem.

> 
>>>
>>> It doesn't make sense to be optimizing the drivers for one subset of
>>> architectures. The scope of the work needed to update the drivers for
>>> this would be ridiculous. Also I don't see how this could be expected
>>> to work on any other architecture when we pretty much need to have a
>>> wmb() before calling the writel on x86 to deal with accesses between
>>> coherent and non-coherent memory. It seems to me more like somebody
>>> added what they considered to be an optimization somewhere that is a
>>> workaround for a poorly written driver. Either that or the barrier is
>>> serving a different purpose then the one we were using.
>>
>> Is there a semantic problem with the definition of wmb() vs. writel() vs.
>> writel_relaxed()? I thought everything is well described in barriers
>> document about what to expect from these APIs.
>>
>> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
>> It doesn't really change anything for x86 but it saves barriers on
>> other architectures.
> 
> Yeah. I had to go through and do some review since my concerns have
> been PowerPC, IA64, and x86 historically. From what I can tell all
> those architectures are setup the same way so that shouldn't be an
> issue.

OK, glad that we are in common understanding.

> 
>>>
>>> It would make more sense to put in the effort making writel and
>>> writel_relaxed consistent between architectures before we go through
>>> and start modifying driver code to support different architectures.
>>>
>>
>> Is there an arch problem that I'm not seeing?
>>
>> Sinan
> 
> It isn't really an arch problem I have as a logistical one. It just
> seems like this is really backwards in terms of how this has been
> handled. For the x86 we have historically had to deal with the
> barriers for this kind of stuff ourselves, now for ARM and a couple
> other architectures they seem to have incorporated the barriers into
> writel and are expecting everyone to move over to writel_relaxed. 

You want to move to writel_relaxed() only if you know that your register
accesses won't have any side effects. 

If you require some memory update to be observable to the HW before
doing a register write, right thing is to do 

wmb() + writel_relaxed()

wmb() + writel() is clearly the wrong choice and that's what the goal of
this change.

If we know that all writel() adaptations in all architectures guarantee
observability, another option is to get rid of wmb() and just keep writel().

I'm not so convinced about this and hoping that someone will correct me. 

wmb() on x86 seems to have an sfence but writel() seems to have a compiler
barrier in it. So, the type of barrier wmb() is using is different. 

we can't say

(wmb()+ writel_relaxed()) == writel()

for all architectures, but maybe I'm wrong.

We really don't want to convert all writel() to writel_relaxed() blindly
without giving much thought into it.

This was also another reason why I limited the changes to wmb() + writel()
combinations only.

If there is wmb() + code + writel() and if we convert this to wmb() + code +
writel_relaxed(), code will not be observed by the HW and this might break
the driver.

> It
> seems like instead of going that route they should have probably just
> looked at pushing the ARM drivers to something like a "writel_strict"
> and adopted the behavior of the other architectures for writel.
> 
> I'll go back through and review. It looks like a number of items were missed.
> 

OK, I'll take a look at each one. Saw the code concern on the e1000e suggestion.
I wanted to raise it here first before inspecting the rest.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  2:17               ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15  2:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/14/2018 9:44 PM, Alexander Duyck wrote:
> On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> Hi Alexander,
>>
>> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>>> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>>
>>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>
>>> Actually I would argue this whole patch set is pointless. For starters
>>> why is it we are only updating the Intel Ethernet drivers here?
>>
>> I did a regex search for wmb() followed by writel() in each drivers directory.
>> I scrubbed the ones I care about and posted this series. Note also that
>> I have one Infiniband patch in the series.
> 
> I didn't see it as it I was only looking at the patches that had ended
> up in intel-wired-lan. Also was there a cover page, I couldn't seem to
> find that on LKML.

Yeah, I didn't have a cover page. These patches were sitting on my branch
for a while. I wanted to get them out without putting too much effort into
it. I'll add it on the next version.

> 
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
> 
> It might be advisable to break things up to subsystem or family. So
> for example if you are going to update the Intel Ethernet drivers I
> would focus on that and maybe spin the infiniband patch of into a
> separate set that can be applied to a separate tree. This is something
> I would consider more of a driver optimization than a fix. In our case
> it makes it easier for us to maintain the patches to the Intel drivers
> if you could submit just those to Jeff and Intel-wired-lan so that we
> can take care of test and review as well as figure out what other
> drivers will would still need to update in order to handle all the
> cases involved in this.
> 
>>> This
>>> seems like something that is going to impact the whole kernel tree
>>> since many of us have been writing drivers for some time assuming x86
>>> style behavior.
>>
>> That's true. We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
> 
> My only real objection is that you are going to be having to scrub
> pretty much ALL the drivers. It seems a little like trying to fix a
> bad tire on your car by paving the road to match the shape of the
> tire.

Or we start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

We could do several iterations like you are suggesting for each subsystem.

> 
>>>
>>> It doesn't make sense to be optimizing the drivers for one subset of
>>> architectures. The scope of the work needed to update the drivers for
>>> this would be ridiculous. Also I don't see how this could be expected
>>> to work on any other architecture when we pretty much need to have a
>>> wmb() before calling the writel on x86 to deal with accesses between
>>> coherent and non-coherent memory. It seems to me more like somebody
>>> added what they considered to be an optimization somewhere that is a
>>> workaround for a poorly written driver. Either that or the barrier is
>>> serving a different purpose then the one we were using.
>>
>> Is there a semantic problem with the definition of wmb() vs. writel() vs.
>> writel_relaxed()? I thought everything is well described in barriers
>> document about what to expect from these APIs.
>>
>> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
>> It doesn't really change anything for x86 but it saves barriers on
>> other architectures.
> 
> Yeah. I had to go through and do some review since my concerns have
> been PowerPC, IA64, and x86 historically. From what I can tell all
> those architectures are setup the same way so that shouldn't be an
> issue.

OK, glad that we are in common understanding.

> 
>>>
>>> It would make more sense to put in the effort making writel and
>>> writel_relaxed consistent between architectures before we go through
>>> and start modifying driver code to support different architectures.
>>>
>>
>> Is there an arch problem that I'm not seeing?
>>
>> Sinan
> 
> It isn't really an arch problem I have as a logistical one. It just
> seems like this is really backwards in terms of how this has been
> handled. For the x86 we have historically had to deal with the
> barriers for this kind of stuff ourselves, now for ARM and a couple
> other architectures they seem to have incorporated the barriers into
> writel and are expecting everyone to move over to writel_relaxed. 

You want to move to writel_relaxed() only if you know that your register
accesses won't have any side effects. 

If you require some memory update to be observable to the HW before
doing a register write, right thing is to do 

wmb() + writel_relaxed()

wmb() + writel() is clearly the wrong choice and that's what the goal of
this change.

If we know that all writel() adaptations in all architectures guarantee
observability, another option is to get rid of wmb() and just keep writel().

I'm not so convinced about this and hoping that someone will correct me. 

wmb() on x86 seems to have an sfence but writel() seems to have a compiler
barrier in it. So, the type of barrier wmb() is using is different. 

we can't say

(wmb()+ writel_relaxed()) == writel()

for all architectures, but maybe I'm wrong.

We really don't want to convert all writel() to writel_relaxed() blindly
without giving much thought into it.

This was also another reason why I limited the changes to wmb() + writel()
combinations only.

If there is wmb() + code + writel() and if we convert this to wmb() + code +
writel_relaxed(), code will not be observed by the HW and this might break
the driver.

> It
> seems like instead of going that route they should have probably just
> looked at pushing the ARM drivers to something like a "writel_strict"
> and adopted the behavior of the other architectures for writel.
> 
> I'll go back through and review. It looks like a number of items were missed.
> 

OK, I'll take a look at each one. Saw the code concern on the e1000e suggestion.
I wanted to raise it here first before inspecting the rest.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  2:17               ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15  2:17 UTC (permalink / raw)
  To: intel-wired-lan

On 3/14/2018 9:44 PM, Alexander Duyck wrote:
> On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> Hi Alexander,
>>
>> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>>> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>>
>>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>
>>> Actually I would argue this whole patch set is pointless. For starters
>>> why is it we are only updating the Intel Ethernet drivers here?
>>
>> I did a regex search for wmb() followed by writel() in each drivers directory.
>> I scrubbed the ones I care about and posted this series. Note also that
>> I have one Infiniband patch in the series.
> 
> I didn't see it as it I was only looking at the patches that had ended
> up in intel-wired-lan. Also was there a cover page, I couldn't seem to
> find that on LKML.

Yeah, I didn't have a cover page. These patches were sitting on my branch
for a while. I wanted to get them out without putting too much effort into
it. I'll add it on the next version.

> 
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
> 
> It might be advisable to break things up to subsystem or family. So
> for example if you are going to update the Intel Ethernet drivers I
> would focus on that and maybe spin the infiniband patch of into a
> separate set that can be applied to a separate tree. This is something
> I would consider more of a driver optimization than a fix. In our case
> it makes it easier for us to maintain the patches to the Intel drivers
> if you could submit just those to Jeff and Intel-wired-lan so that we
> can take care of test and review as well as figure out what other
> drivers will would still need to update in order to handle all the
> cases involved in this.
> 
>>> This
>>> seems like something that is going to impact the whole kernel tree
>>> since many of us have been writing drivers for some time assuming x86
>>> style behavior.
>>
>> That's true. We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
> 
> My only real objection is that you are going to be having to scrub
> pretty much ALL the drivers. It seems a little like trying to fix a
> bad tire on your car by paving the road to match the shape of the
> tire.

Or we start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

We could do several iterations like you are suggesting for each subsystem.

> 
>>>
>>> It doesn't make sense to be optimizing the drivers for one subset of
>>> architectures. The scope of the work needed to update the drivers for
>>> this would be ridiculous. Also I don't see how this could be expected
>>> to work on any other architecture when we pretty much need to have a
>>> wmb() before calling the writel on x86 to deal with accesses between
>>> coherent and non-coherent memory. It seems to me more like somebody
>>> added what they considered to be an optimization somewhere that is a
>>> workaround for a poorly written driver. Either that or the barrier is
>>> serving a different purpose then the one we were using.
>>
>> Is there a semantic problem with the definition of wmb() vs. writel() vs.
>> writel_relaxed()? I thought everything is well described in barriers
>> document about what to expect from these APIs.
>>
>> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
>> It doesn't really change anything for x86 but it saves barriers on
>> other architectures.
> 
> Yeah. I had to go through and do some review since my concerns have
> been PowerPC, IA64, and x86 historically. From what I can tell all
> those architectures are setup the same way so that shouldn't be an
> issue.

OK, glad that we are in common understanding.

> 
>>>
>>> It would make more sense to put in the effort making writel and
>>> writel_relaxed consistent between architectures before we go through
>>> and start modifying driver code to support different architectures.
>>>
>>
>> Is there an arch problem that I'm not seeing?
>>
>> Sinan
> 
> It isn't really an arch problem I have as a logistical one. It just
> seems like this is really backwards in terms of how this has been
> handled. For the x86 we have historically had to deal with the
> barriers for this kind of stuff ourselves, now for ARM and a couple
> other architectures they seem to have incorporated the barriers into
> writel and are expecting everyone to move over to writel_relaxed. 

You want to move to writel_relaxed() only if you know that your register
accesses won't have any side effects. 

If you require some memory update to be observable to the HW before
doing a register write, right thing is to do 

wmb() + writel_relaxed()

wmb() + writel() is clearly the wrong choice and that's what the goal of
this change.

If we know that all writel() adaptations in all architectures guarantee
observability, another option is to get rid of wmb() and just keep writel().

I'm not so convinced about this and hoping that someone will correct me. 

wmb() on x86 seems to have an sfence but writel() seems to have a compiler
barrier in it. So, the type of barrier wmb() is using is different. 

we can't say

(wmb()+ writel_relaxed()) == writel()

for all architectures, but maybe I'm wrong.

We really don't want to convert all writel() to writel_relaxed() blindly
without giving much thought into it.

This was also another reason why I limited the changes to wmb() + writel()
combinations only.

If there is wmb() + code + writel() and if we convert this to wmb() + code +
writel_relaxed(), code will not be observed by the HW and this might break
the driver.

> It
> seems like instead of going that route they should have probably just
> looked at pushing the ARM drivers to something like a "writel_strict"
> and adopted the behavior of the other architectures for writel.
> 
> I'll go back through and review. It looks like a number of items were missed.
> 

OK, I'll take a look at each one. Saw the code concern on the e1000e suggestion.
I wanted to raise it here first before inspecting the rest.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-15  2:17               ` Sinan Kaya
  (?)
@ 2018-03-15 14:32                 ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15 14:32 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Wed, Mar 14, 2018 at 7:17 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/14/2018 9:44 PM, Alexander Duyck wrote:
>> On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>>> Hi Alexander,
>>>
>>> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>>>> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>>>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>>>
>>>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>>
>>>> Actually I would argue this whole patch set is pointless. For starters
>>>> why is it we are only updating the Intel Ethernet drivers here?
>>>
>>> I did a regex search for wmb() followed by writel() in each drivers directory.
>>> I scrubbed the ones I care about and posted this series. Note also that
>>> I have one Infiniband patch in the series.
>>
>> I didn't see it as it I was only looking at the patches that had ended
>> up in intel-wired-lan. Also was there a cover page, I couldn't seem to
>> find that on LKML.
>
> Yeah, I didn't have a cover page. These patches were sitting on my branch
> for a while. I wanted to get them out without putting too much effort into
> it. I'll add it on the next version.
>
>>
>>> I considered "ease of change", "popular usage" and "performance critical
>>> path" as the determining criteria for my filtering.
>>
>> It might be advisable to break things up to subsystem or family. So
>> for example if you are going to update the Intel Ethernet drivers I
>> would focus on that and maybe spin the infiniband patch of into a
>> separate set that can be applied to a separate tree. This is something
>> I would consider more of a driver optimization than a fix. In our case
>> it makes it easier for us to maintain the patches to the Intel drivers
>> if you could submit just those to Jeff and Intel-wired-lan so that we
>> can take care of test and review as well as figure out what other
>> drivers will would still need to update in order to handle all the
>> cases involved in this.
>>
>>>> This
>>>> seems like something that is going to impact the whole kernel tree
>>>> since many of us have been writing drivers for some time assuming x86
>>>> style behavior.
>>>
>>> That's true. We used relaxed API heavily on ARM for a long time but
>>> it did not exist on other architectures. For this reason, relaxed
>>> architectures have been paying double penalty in order to use the common
>>> drivers.
>>>
>>> Now that relaxed API is present on all architectures, we can go and scrub
>>> all drivers to see what needs to change and what can remain.
>>
>> My only real objection is that you are going to be having to scrub
>> pretty much ALL the drivers. It seems a little like trying to fix a
>> bad tire on your car by paving the road to match the shape of the
>> tire.
>
> Or we start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
>
> We could do several iterations like you are suggesting for each subsystem.
>
>>
>>>>
>>>> It doesn't make sense to be optimizing the drivers for one subset of
>>>> architectures. The scope of the work needed to update the drivers for
>>>> this would be ridiculous. Also I don't see how this could be expected
>>>> to work on any other architecture when we pretty much need to have a
>>>> wmb() before calling the writel on x86 to deal with accesses between
>>>> coherent and non-coherent memory. It seems to me more like somebody
>>>> added what they considered to be an optimization somewhere that is a
>>>> workaround for a poorly written driver. Either that or the barrier is
>>>> serving a different purpose then the one we were using.
>>>
>>> Is there a semantic problem with the definition of wmb() vs. writel() vs.
>>> writel_relaxed()? I thought everything is well described in barriers
>>> document about what to expect from these APIs.
>>>
>>> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
>>> It doesn't really change anything for x86 but it saves barriers on
>>> other architectures.
>>
>> Yeah. I had to go through and do some review since my concerns have
>> been PowerPC, IA64, and x86 historically. From what I can tell all
>> those architectures are setup the same way so that shouldn't be an
>> issue.
>
> OK, glad that we are in common understanding.
>
>>
>>>>
>>>> It would make more sense to put in the effort making writel and
>>>> writel_relaxed consistent between architectures before we go through
>>>> and start modifying driver code to support different architectures.
>>>>
>>>
>>> Is there an arch problem that I'm not seeing?
>>>
>>> Sinan
>>
>> It isn't really an arch problem I have as a logistical one. It just
>> seems like this is really backwards in terms of how this has been
>> handled. For the x86 we have historically had to deal with the
>> barriers for this kind of stuff ourselves, now for ARM and a couple
>> other architectures they seem to have incorporated the barriers into
>> writel and are expecting everyone to move over to writel_relaxed.
>
> You want to move to writel_relaxed() only if you know that your register
> accesses won't have any side effects.
>
> If you require some memory update to be observable to the HW before
> doing a register write, right thing is to do
>
> wmb() + writel_relaxed()
>
> wmb() + writel() is clearly the wrong choice and that's what the goal of
> this change.
>
> If we know that all writel() adaptations in all architectures guarantee
> observability, another option is to get rid of wmb() and just keep writel().
>
> I'm not so convinced about this and hoping that someone will correct me.
>
> wmb() on x86 seems to have an sfence but writel() seems to have a compiler
> barrier in it. So, the type of barrier wmb() is using is different.
>
> we can't say
>
> (wmb()+ writel_relaxed()) == writel()
>
> for all architectures, but maybe I'm wrong.
>
> We really don't want to convert all writel() to writel_relaxed() blindly
> without giving much thought into it.
>
> This was also another reason why I limited the changes to wmb() + writel()
> combinations only.
>
> If there is wmb() + code + writel() and if we convert this to wmb() + code +
> writel_relaxed(), code will not be observed by the HW and this might break
> the driver.

So that is where things will be a bit trickier to understand from the
perspective of someone who hasn't worked in our drivers for a while.

We tend to do something like:
  update tx_buffer_info
  update tx_desc
  wmb()
  point first tx_buffer_info next_to_watch value at last tx_desc
  update next_to_use
  notify device via writel

We do it this way because we have to synchronize between the Tx
cleanup path and the hardware so we basically lump the two barriers
together. instead of invoking both a smp_wmb and a wmb. Now that I
look at the pseudocode though I wonder if we shouldn't move the
next_to_use update before the wmb, but that might be material for
another patch. Anyway, in the Tx cleanup path we should have an
smp_rmb() after we read the next_to_watch values so that we avoid
reading any of the other fields in the buffer_info if either the field
is NULL or the descriptor pointed to has not been written back.

>> It
>> seems like instead of going that route they should have probably just
>> looked at pushing the ARM drivers to something like a "writel_strict"
>> and adopted the behavior of the other architectures for writel.
>>
>> I'll go back through and review. It looks like a number of items were missed.
>>
>
> OK, I'll take a look at each one. Saw the code concern on the e1000e suggestion.
> I wanted to raise it here first before inspecting the rest.

In the case of the Intel drivers it is pretty straightforward as most
of the drivers wrap the writel usage in a macro with the exception of
hot-path areas. In my review feedback I only focused on the tail
updates for the Tx and Rx rings. We have a few other spots where
writel is used to update the interrupt registers but I figure we can
leave those as-is for now as we would want to guarantee ordering of
the tail writes versus the register writes to re-enable the interrupt
for the queue.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 14:32                 ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15 14:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 14, 2018 at 7:17 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/14/2018 9:44 PM, Alexander Duyck wrote:
>> On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>>> Hi Alexander,
>>>
>>> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>>>> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>>>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>>>
>>>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>>
>>>> Actually I would argue this whole patch set is pointless. For starters
>>>> why is it we are only updating the Intel Ethernet drivers here?
>>>
>>> I did a regex search for wmb() followed by writel() in each drivers directory.
>>> I scrubbed the ones I care about and posted this series. Note also that
>>> I have one Infiniband patch in the series.
>>
>> I didn't see it as it I was only looking at the patches that had ended
>> up in intel-wired-lan. Also was there a cover page, I couldn't seem to
>> find that on LKML.
>
> Yeah, I didn't have a cover page. These patches were sitting on my branch
> for a while. I wanted to get them out without putting too much effort into
> it. I'll add it on the next version.
>
>>
>>> I considered "ease of change", "popular usage" and "performance critical
>>> path" as the determining criteria for my filtering.
>>
>> It might be advisable to break things up to subsystem or family. So
>> for example if you are going to update the Intel Ethernet drivers I
>> would focus on that and maybe spin the infiniband patch of into a
>> separate set that can be applied to a separate tree. This is something
>> I would consider more of a driver optimization than a fix. In our case
>> it makes it easier for us to maintain the patches to the Intel drivers
>> if you could submit just those to Jeff and Intel-wired-lan so that we
>> can take care of test and review as well as figure out what other
>> drivers will would still need to update in order to handle all the
>> cases involved in this.
>>
>>>> This
>>>> seems like something that is going to impact the whole kernel tree
>>>> since many of us have been writing drivers for some time assuming x86
>>>> style behavior.
>>>
>>> That's true. We used relaxed API heavily on ARM for a long time but
>>> it did not exist on other architectures. For this reason, relaxed
>>> architectures have been paying double penalty in order to use the common
>>> drivers.
>>>
>>> Now that relaxed API is present on all architectures, we can go and scrub
>>> all drivers to see what needs to change and what can remain.
>>
>> My only real objection is that you are going to be having to scrub
>> pretty much ALL the drivers. It seems a little like trying to fix a
>> bad tire on your car by paving the road to match the shape of the
>> tire.
>
> Or we start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
>
> We could do several iterations like you are suggesting for each subsystem.
>
>>
>>>>
>>>> It doesn't make sense to be optimizing the drivers for one subset of
>>>> architectures. The scope of the work needed to update the drivers for
>>>> this would be ridiculous. Also I don't see how this could be expected
>>>> to work on any other architecture when we pretty much need to have a
>>>> wmb() before calling the writel on x86 to deal with accesses between
>>>> coherent and non-coherent memory. It seems to me more like somebody
>>>> added what they considered to be an optimization somewhere that is a
>>>> workaround for a poorly written driver. Either that or the barrier is
>>>> serving a different purpose then the one we were using.
>>>
>>> Is there a semantic problem with the definition of wmb() vs. writel() vs.
>>> writel_relaxed()? I thought everything is well described in barriers
>>> document about what to expect from these APIs.
>>>
>>> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
>>> It doesn't really change anything for x86 but it saves barriers on
>>> other architectures.
>>
>> Yeah. I had to go through and do some review since my concerns have
>> been PowerPC, IA64, and x86 historically. From what I can tell all
>> those architectures are setup the same way so that shouldn't be an
>> issue.
>
> OK, glad that we are in common understanding.
>
>>
>>>>
>>>> It would make more sense to put in the effort making writel and
>>>> writel_relaxed consistent between architectures before we go through
>>>> and start modifying driver code to support different architectures.
>>>>
>>>
>>> Is there an arch problem that I'm not seeing?
>>>
>>> Sinan
>>
>> It isn't really an arch problem I have as a logistical one. It just
>> seems like this is really backwards in terms of how this has been
>> handled. For the x86 we have historically had to deal with the
>> barriers for this kind of stuff ourselves, now for ARM and a couple
>> other architectures they seem to have incorporated the barriers into
>> writel and are expecting everyone to move over to writel_relaxed.
>
> You want to move to writel_relaxed() only if you know that your register
> accesses won't have any side effects.
>
> If you require some memory update to be observable to the HW before
> doing a register write, right thing is to do
>
> wmb() + writel_relaxed()
>
> wmb() + writel() is clearly the wrong choice and that's what the goal of
> this change.
>
> If we know that all writel() adaptations in all architectures guarantee
> observability, another option is to get rid of wmb() and just keep writel().
>
> I'm not so convinced about this and hoping that someone will correct me.
>
> wmb() on x86 seems to have an sfence but writel() seems to have a compiler
> barrier in it. So, the type of barrier wmb() is using is different.
>
> we can't say
>
> (wmb()+ writel_relaxed()) == writel()
>
> for all architectures, but maybe I'm wrong.
>
> We really don't want to convert all writel() to writel_relaxed() blindly
> without giving much thought into it.
>
> This was also another reason why I limited the changes to wmb() + writel()
> combinations only.
>
> If there is wmb() + code + writel() and if we convert this to wmb() + code +
> writel_relaxed(), code will not be observed by the HW and this might break
> the driver.

So that is where things will be a bit trickier to understand from the
perspective of someone who hasn't worked in our drivers for a while.

We tend to do something like:
  update tx_buffer_info
  update tx_desc
  wmb()
  point first tx_buffer_info next_to_watch value at last tx_desc
  update next_to_use
  notify device via writel

We do it this way because we have to synchronize between the Tx
cleanup path and the hardware so we basically lump the two barriers
together. instead of invoking both a smp_wmb and a wmb. Now that I
look at the pseudocode though I wonder if we shouldn't move the
next_to_use update before the wmb, but that might be material for
another patch. Anyway, in the Tx cleanup path we should have an
smp_rmb() after we read the next_to_watch values so that we avoid
reading any of the other fields in the buffer_info if either the field
is NULL or the descriptor pointed to has not been written back.

>> It
>> seems like instead of going that route they should have probably just
>> looked at pushing the ARM drivers to something like a "writel_strict"
>> and adopted the behavior of the other architectures for writel.
>>
>> I'll go back through and review. It looks like a number of items were missed.
>>
>
> OK, I'll take a look at each one. Saw the code concern on the e1000e suggestion.
> I wanted to raise it here first before inspecting the rest.

In the case of the Intel drivers it is pretty straightforward as most
of the drivers wrap the writel usage in a macro with the exception of
hot-path areas. In my review feedback I only focused on the tail
updates for the Tx and Rx rings. We have a few other spots where
writel is used to update the interrupt registers but I figure we can
leave those as-is for now as we would want to guarantee ordering of
the tail writes versus the register writes to re-enable the interrupt
for the queue.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 14:32                 ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15 14:32 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Mar 14, 2018 at 7:17 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/14/2018 9:44 PM, Alexander Duyck wrote:
>> On Wed, Mar 14, 2018 at 3:57 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>>> Hi Alexander,
>>>
>>> On 3/14/2018 5:49 PM, Alexander Duyck wrote:
>>>> On Wed, Mar 14, 2018 at 5:13 AM,  <okaya@codeaurora.org> wrote:
>>>>> On 2018-03-14 01:08, Timur Tabi wrote:
>>>>>>
>>>>>> On 3/13/18 10:20 PM, Sinan Kaya wrote:
>>>
>>>> Actually I would argue this whole patch set is pointless. For starters
>>>> why is it we are only updating the Intel Ethernet drivers here?
>>>
>>> I did a regex search for wmb() followed by writel() in each drivers directory.
>>> I scrubbed the ones I care about and posted this series. Note also that
>>> I have one Infiniband patch in the series.
>>
>> I didn't see it as it I was only looking at the patches that had ended
>> up in intel-wired-lan. Also was there a cover page, I couldn't seem to
>> find that on LKML.
>
> Yeah, I didn't have a cover page. These patches were sitting on my branch
> for a while. I wanted to get them out without putting too much effort into
> it. I'll add it on the next version.
>
>>
>>> I considered "ease of change", "popular usage" and "performance critical
>>> path" as the determining criteria for my filtering.
>>
>> It might be advisable to break things up to subsystem or family. So
>> for example if you are going to update the Intel Ethernet drivers I
>> would focus on that and maybe spin the infiniband patch of into a
>> separate set that can be applied to a separate tree. This is something
>> I would consider more of a driver optimization than a fix. In our case
>> it makes it easier for us to maintain the patches to the Intel drivers
>> if you could submit just those to Jeff and Intel-wired-lan so that we
>> can take care of test and review as well as figure out what other
>> drivers will would still need to update in order to handle all the
>> cases involved in this.
>>
>>>> This
>>>> seems like something that is going to impact the whole kernel tree
>>>> since many of us have been writing drivers for some time assuming x86
>>>> style behavior.
>>>
>>> That's true. We used relaxed API heavily on ARM for a long time but
>>> it did not exist on other architectures. For this reason, relaxed
>>> architectures have been paying double penalty in order to use the common
>>> drivers.
>>>
>>> Now that relaxed API is present on all architectures, we can go and scrub
>>> all drivers to see what needs to change and what can remain.
>>
>> My only real objection is that you are going to be having to scrub
>> pretty much ALL the drivers. It seems a little like trying to fix a
>> bad tire on your car by paving the road to match the shape of the
>> tire.
>
> Or we start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
>
> We could do several iterations like you are suggesting for each subsystem.
>
>>
>>>>
>>>> It doesn't make sense to be optimizing the drivers for one subset of
>>>> architectures. The scope of the work needed to update the drivers for
>>>> this would be ridiculous. Also I don't see how this could be expected
>>>> to work on any other architecture when we pretty much need to have a
>>>> wmb() before calling the writel on x86 to deal with accesses between
>>>> coherent and non-coherent memory. It seems to me more like somebody
>>>> added what they considered to be an optimization somewhere that is a
>>>> workaround for a poorly written driver. Either that or the barrier is
>>>> serving a different purpose then the one we were using.
>>>
>>> Is there a semantic problem with the definition of wmb() vs. writel() vs.
>>> writel_relaxed()? I thought everything is well described in barriers
>>> document about what to expect from these APIs.
>>>
>>> AFAIK, writel() is equal to writel_relaxed() on x86 architecture.
>>> It doesn't really change anything for x86 but it saves barriers on
>>> other architectures.
>>
>> Yeah. I had to go through and do some review since my concerns have
>> been PowerPC, IA64, and x86 historically. From what I can tell all
>> those architectures are setup the same way so that shouldn't be an
>> issue.
>
> OK, glad that we are in common understanding.
>
>>
>>>>
>>>> It would make more sense to put in the effort making writel and
>>>> writel_relaxed consistent between architectures before we go through
>>>> and start modifying driver code to support different architectures.
>>>>
>>>
>>> Is there an arch problem that I'm not seeing?
>>>
>>> Sinan
>>
>> It isn't really an arch problem I have as a logistical one. It just
>> seems like this is really backwards in terms of how this has been
>> handled. For the x86 we have historically had to deal with the
>> barriers for this kind of stuff ourselves, now for ARM and a couple
>> other architectures they seem to have incorporated the barriers into
>> writel and are expecting everyone to move over to writel_relaxed.
>
> You want to move to writel_relaxed() only if you know that your register
> accesses won't have any side effects.
>
> If you require some memory update to be observable to the HW before
> doing a register write, right thing is to do
>
> wmb() + writel_relaxed()
>
> wmb() + writel() is clearly the wrong choice and that's what the goal of
> this change.
>
> If we know that all writel() adaptations in all architectures guarantee
> observability, another option is to get rid of wmb() and just keep writel().
>
> I'm not so convinced about this and hoping that someone will correct me.
>
> wmb() on x86 seems to have an sfence but writel() seems to have a compiler
> barrier in it. So, the type of barrier wmb() is using is different.
>
> we can't say
>
> (wmb()+ writel_relaxed()) == writel()
>
> for all architectures, but maybe I'm wrong.
>
> We really don't want to convert all writel() to writel_relaxed() blindly
> without giving much thought into it.
>
> This was also another reason why I limited the changes to wmb() + writel()
> combinations only.
>
> If there is wmb() + code + writel() and if we convert this to wmb() + code +
> writel_relaxed(), code will not be observed by the HW and this might break
> the driver.

So that is where things will be a bit trickier to understand from the
perspective of someone who hasn't worked in our drivers for a while.

We tend to do something like:
  update tx_buffer_info
  update tx_desc
  wmb()
  point first tx_buffer_info next_to_watch value at last tx_desc
  update next_to_use
  notify device via writel

We do it this way because we have to synchronize between the Tx
cleanup path and the hardware so we basically lump the two barriers
together. instead of invoking both a smp_wmb and a wmb. Now that I
look at the pseudocode though I wonder if we shouldn't move the
next_to_use update before the wmb, but that might be material for
another patch. Anyway, in the Tx cleanup path we should have an
smp_rmb() after we read the next_to_watch values so that we avoid
reading any of the other fields in the buffer_info if either the field
is NULL or the descriptor pointed to has not been written back.

>> It
>> seems like instead of going that route they should have probably just
>> looked at pushing the ARM drivers to something like a "writel_strict"
>> and adopted the behavior of the other architectures for writel.
>>
>> I'll go back through and review. It looks like a number of items were missed.
>>
>
> OK, I'll take a look at each one. Saw the code concern on the e1000e suggestion.
> I wanted to raise it here first before inspecting the rest.

In the case of the Intel drivers it is pretty straightforward as most
of the drivers wrap the writel usage in a macro with the exception of
hot-path areas. In my review feedback I only focused on the tail
updates for the Tx and Rx rings. We have a few other spots where
writel is used to update the interrupt registers but I figure we can
leave those as-is for now as we would want to guarantee ordering of
the tail writes versus the register writes to re-enable the interrupt
for the queue.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-15 14:32                 ` Alexander Duyck
  (?)
@ 2018-03-15 16:21                   ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15 16:21 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On 3/15/2018 10:32 AM, Alexander Duyck wrote:
> We tend to do something like:
>   update tx_buffer_info
>   update tx_desc
>   wmb()
>   point first tx_buffer_info next_to_watch value at last tx_desc
>   update next_to_use
>   notify device via writel
> 
> We do it this way because we have to synchronize between the Tx
> cleanup path and the hardware so we basically lump the two barriers
> together. instead of invoking both a smp_wmb and a wmb. Now that I
> look at the pseudocode though I wonder if we shouldn't move the
> next_to_use update before the wmb, but that might be material for
> another patch. Anyway, in the Tx cleanup path we should have an
> smp_rmb() after we read the next_to_watch values so that we avoid
> reading any of the other fields in the buffer_info if either the field
> is NULL or the descriptor pointed to has not been written back.

How do you feel about keeping wmb() very close to writel_relaxed() like this?

   update tx_buffer_info
   update tx_desc
   point first tx_buffer_info next_to_watch value at last tx_desc
   update next_to_use
   wmb()
   notify device via writel_relaxed()

I'm afraid that if the order of wmb() and writel() is not very
obvious or hidden in multiple functions, somebody can introduce a very nasty
bug in the future.

We also have to think about code maintenance.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 16:21                   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/15/2018 10:32 AM, Alexander Duyck wrote:
> We tend to do something like:
>   update tx_buffer_info
>   update tx_desc
>   wmb()
>   point first tx_buffer_info next_to_watch value at last tx_desc
>   update next_to_use
>   notify device via writel
> 
> We do it this way because we have to synchronize between the Tx
> cleanup path and the hardware so we basically lump the two barriers
> together. instead of invoking both a smp_wmb and a wmb. Now that I
> look at the pseudocode though I wonder if we shouldn't move the
> next_to_use update before the wmb, but that might be material for
> another patch. Anyway, in the Tx cleanup path we should have an
> smp_rmb() after we read the next_to_watch values so that we avoid
> reading any of the other fields in the buffer_info if either the field
> is NULL or the descriptor pointed to has not been written back.

How do you feel about keeping wmb() very close to writel_relaxed() like this?

   update tx_buffer_info
   update tx_desc
   point first tx_buffer_info next_to_watch value at last tx_desc
   update next_to_use
   wmb()
   notify device via writel_relaxed()

I'm afraid that if the order of wmb() and writel() is not very
obvious or hidden in multiple functions, somebody can introduce a very nasty
bug in the future.

We also have to think about code maintenance.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 16:21                   ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15 16:21 UTC (permalink / raw)
  To: intel-wired-lan

On 3/15/2018 10:32 AM, Alexander Duyck wrote:
> We tend to do something like:
>   update tx_buffer_info
>   update tx_desc
>   wmb()
>   point first tx_buffer_info next_to_watch value at last tx_desc
>   update next_to_use
>   notify device via writel
> 
> We do it this way because we have to synchronize between the Tx
> cleanup path and the hardware so we basically lump the two barriers
> together. instead of invoking both a smp_wmb and a wmb. Now that I
> look at the pseudocode though I wonder if we shouldn't move the
> next_to_use update before the wmb, but that might be material for
> another patch. Anyway, in the Tx cleanup path we should have an
> smp_rmb() after we read the next_to_watch values so that we avoid
> reading any of the other fields in the buffer_info if either the field
> is NULL or the descriptor pointed to has not been written back.

How do you feel about keeping wmb() very close to writel_relaxed() like this?

   update tx_buffer_info
   update tx_desc
   point first tx_buffer_info next_to_watch value at last tx_desc
   update next_to_use
   wmb()
   notify device via writel_relaxed()

I'm afraid that if the order of wmb() and writel() is not very
obvious or hidden in multiple functions, somebody can introduce a very nasty
bug in the future.

We also have to think about code maintenance.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-15 16:21                   ` Sinan Kaya
  (?)
@ 2018-03-15 16:27                     ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15 16:27 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On 3/15/2018 12:21 PM, Sinan Kaya wrote:
> On 3/15/2018 10:32 AM, Alexander Duyck wrote:
>> We tend to do something like:
>>   update tx_buffer_info
>>   update tx_desc
>>   wmb()
>>   point first tx_buffer_info next_to_watch value at last tx_desc
>>   update next_to_use
>>   notify device via writel
>>
>> We do it this way because we have to synchronize between the Tx
>> cleanup path and the hardware so we basically lump the two barriers
>> together. instead of invoking both a smp_wmb and a wmb. Now that I
>> look at the pseudocode though I wonder if we shouldn't move the
>> next_to_use update before the wmb, but that might be material for
>> another patch. Anyway, in the Tx cleanup path we should have an
>> smp_rmb() after we read the next_to_watch values so that we avoid
>> reading any of the other fields in the buffer_info if either the field
>> is NULL or the descriptor pointed to has not been written back.
> 
> How do you feel about keeping wmb() very close to writel_relaxed() like this?
> 
>    update tx_buffer_info
>    update tx_desc
>    point first tx_buffer_info next_to_watch value at last tx_desc
>    update next_to_use
>    wmb()
>    notify device via writel_relaxed()
> 
> I'm afraid that if the order of wmb() and writel() is not very
> obvious or hidden in multiple functions, somebody can introduce a very nasty
> bug in the future.
> 
> We also have to think about code maintenance.
> 

Now that I read your email again, I think this is the reason if I understood you
correctly. 

"instead of invoking both a smp_wmb and a wmb"

You'd need something like

    update tx_buffer_info
    update tx_desc
    smp_wmb()
    point first tx_buffer_info next_to_watch value at last tx_desc
    update next_to_use
    wmb()
    notify device via writel_relaxed()

Let me work on your comments.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 16:27                     ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/15/2018 12:21 PM, Sinan Kaya wrote:
> On 3/15/2018 10:32 AM, Alexander Duyck wrote:
>> We tend to do something like:
>>   update tx_buffer_info
>>   update tx_desc
>>   wmb()
>>   point first tx_buffer_info next_to_watch value at last tx_desc
>>   update next_to_use
>>   notify device via writel
>>
>> We do it this way because we have to synchronize between the Tx
>> cleanup path and the hardware so we basically lump the two barriers
>> together. instead of invoking both a smp_wmb and a wmb. Now that I
>> look at the pseudocode though I wonder if we shouldn't move the
>> next_to_use update before the wmb, but that might be material for
>> another patch. Anyway, in the Tx cleanup path we should have an
>> smp_rmb() after we read the next_to_watch values so that we avoid
>> reading any of the other fields in the buffer_info if either the field
>> is NULL or the descriptor pointed to has not been written back.
> 
> How do you feel about keeping wmb() very close to writel_relaxed() like this?
> 
>    update tx_buffer_info
>    update tx_desc
>    point first tx_buffer_info next_to_watch value at last tx_desc
>    update next_to_use
>    wmb()
>    notify device via writel_relaxed()
> 
> I'm afraid that if the order of wmb() and writel() is not very
> obvious or hidden in multiple functions, somebody can introduce a very nasty
> bug in the future.
> 
> We also have to think about code maintenance.
> 

Now that I read your email again, I think this is the reason if I understood you
correctly. 

"instead of invoking both a smp_wmb and a wmb"

You'd need something like

    update tx_buffer_info
    update tx_desc
    smp_wmb()
    point first tx_buffer_info next_to_watch value at last tx_desc
    update next_to_use
    wmb()
    notify device via writel_relaxed()

Let me work on your comments.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 16:27                     ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15 16:27 UTC (permalink / raw)
  To: intel-wired-lan

On 3/15/2018 12:21 PM, Sinan Kaya wrote:
> On 3/15/2018 10:32 AM, Alexander Duyck wrote:
>> We tend to do something like:
>>   update tx_buffer_info
>>   update tx_desc
>>   wmb()
>>   point first tx_buffer_info next_to_watch value at last tx_desc
>>   update next_to_use
>>   notify device via writel
>>
>> We do it this way because we have to synchronize between the Tx
>> cleanup path and the hardware so we basically lump the two barriers
>> together. instead of invoking both a smp_wmb and a wmb. Now that I
>> look at the pseudocode though I wonder if we shouldn't move the
>> next_to_use update before the wmb, but that might be material for
>> another patch. Anyway, in the Tx cleanup path we should have an
>> smp_rmb() after we read the next_to_watch values so that we avoid
>> reading any of the other fields in the buffer_info if either the field
>> is NULL or the descriptor pointed to has not been written back.
> 
> How do you feel about keeping wmb() very close to writel_relaxed() like this?
> 
>    update tx_buffer_info
>    update tx_desc
>    point first tx_buffer_info next_to_watch value at last tx_desc
>    update next_to_use
>    wmb()
>    notify device via writel_relaxed()
> 
> I'm afraid that if the order of wmb() and writel() is not very
> obvious or hidden in multiple functions, somebody can introduce a very nasty
> bug in the future.
> 
> We also have to think about code maintenance.
> 

Now that I read your email again, I think this is the reason if I understood you
correctly. 

"instead of invoking both a smp_wmb and a wmb"

You'd need something like

    update tx_buffer_info
    update tx_desc
    smp_wmb()
    point first tx_buffer_info next_to_watch value at last tx_desc
    update next_to_use
    wmb()
    notify device via writel_relaxed()

Let me work on your comments.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-15 16:27                     ` Sinan Kaya
  (?)
@ 2018-03-15 16:58                       ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15 16:58 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Thu, Mar 15, 2018 at 9:27 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/15/2018 12:21 PM, Sinan Kaya wrote:
>> On 3/15/2018 10:32 AM, Alexander Duyck wrote:
>>> We tend to do something like:
>>>   update tx_buffer_info
>>>   update tx_desc
>>>   wmb()
>>>   point first tx_buffer_info next_to_watch value at last tx_desc
>>>   update next_to_use
>>>   notify device via writel
>>>
>>> We do it this way because we have to synchronize between the Tx
>>> cleanup path and the hardware so we basically lump the two barriers
>>> together. instead of invoking both a smp_wmb and a wmb. Now that I
>>> look at the pseudocode though I wonder if we shouldn't move the
>>> next_to_use update before the wmb, but that might be material for
>>> another patch. Anyway, in the Tx cleanup path we should have an
>>> smp_rmb() after we read the next_to_watch values so that we avoid
>>> reading any of the other fields in the buffer_info if either the field
>>> is NULL or the descriptor pointed to has not been written back.
>>
>> How do you feel about keeping wmb() very close to writel_relaxed() like this?
>>
>>    update tx_buffer_info
>>    update tx_desc
>>    point first tx_buffer_info next_to_watch value at last tx_desc
>>    update next_to_use
>>    wmb()
>>    notify device via writel_relaxed()
>>
>> I'm afraid that if the order of wmb() and writel() is not very
>> obvious or hidden in multiple functions, somebody can introduce a very nasty
>> bug in the future.
>>
>> We also have to think about code maintenance.
>>
>
> Now that I read your email again, I think this is the reason if I understood you
> correctly.
>
> "instead of invoking both a smp_wmb and a wmb"
>
> You'd need something like
>
>     update tx_buffer_info
>     update tx_desc
>     smp_wmb()
>     point first tx_buffer_info next_to_watch value at last tx_desc
>     update next_to_use
>     wmb()
>     notify device via writel_relaxed()
>
> Let me work on your comments.

Yes, we would be doing something like that, but we are using just a
single wmb() to cover both cases since hardware will never look at the
tx_buffer_info and software will never read that descriptor ring as
long as the next_to_watch is NULL. By doing it this way we should have
both cases covered and not need to worry

The only other bit still remaining is the "maybe_stop_tx" logic which
lives between the wmb and writel_relaxed. That logic has a smp_mb
living in it that is triggered if we have to stop the queue. Once
again though that is only viewed by software so it existing between
the wmb and the writel_relaxed should not be an issue.

Starting to understand why I was a bit hesitant to have us start
taking on these changes now? :-)

- Alex

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 16:58                       ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 15, 2018 at 9:27 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/15/2018 12:21 PM, Sinan Kaya wrote:
>> On 3/15/2018 10:32 AM, Alexander Duyck wrote:
>>> We tend to do something like:
>>>   update tx_buffer_info
>>>   update tx_desc
>>>   wmb()
>>>   point first tx_buffer_info next_to_watch value at last tx_desc
>>>   update next_to_use
>>>   notify device via writel
>>>
>>> We do it this way because we have to synchronize between the Tx
>>> cleanup path and the hardware so we basically lump the two barriers
>>> together. instead of invoking both a smp_wmb and a wmb. Now that I
>>> look at the pseudocode though I wonder if we shouldn't move the
>>> next_to_use update before the wmb, but that might be material for
>>> another patch. Anyway, in the Tx cleanup path we should have an
>>> smp_rmb() after we read the next_to_watch values so that we avoid
>>> reading any of the other fields in the buffer_info if either the field
>>> is NULL or the descriptor pointed to has not been written back.
>>
>> How do you feel about keeping wmb() very close to writel_relaxed() like this?
>>
>>    update tx_buffer_info
>>    update tx_desc
>>    point first tx_buffer_info next_to_watch value at last tx_desc
>>    update next_to_use
>>    wmb()
>>    notify device via writel_relaxed()
>>
>> I'm afraid that if the order of wmb() and writel() is not very
>> obvious or hidden in multiple functions, somebody can introduce a very nasty
>> bug in the future.
>>
>> We also have to think about code maintenance.
>>
>
> Now that I read your email again, I think this is the reason if I understood you
> correctly.
>
> "instead of invoking both a smp_wmb and a wmb"
>
> You'd need something like
>
>     update tx_buffer_info
>     update tx_desc
>     smp_wmb()
>     point first tx_buffer_info next_to_watch value at last tx_desc
>     update next_to_use
>     wmb()
>     notify device via writel_relaxed()
>
> Let me work on your comments.

Yes, we would be doing something like that, but we are using just a
single wmb() to cover both cases since hardware will never look at the
tx_buffer_info and software will never read that descriptor ring as
long as the next_to_watch is NULL. By doing it this way we should have
both cases covered and not need to worry

The only other bit still remaining is the "maybe_stop_tx" logic which
lives between the wmb and writel_relaxed. That logic has a smp_mb
living in it that is triggered if we have to stop the queue. Once
again though that is only viewed by software so it existing between
the wmb and the writel_relaxed should not be an issue.

Starting to understand why I was a bit hesitant to have us start
taking on these changes now? :-)

- Alex

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 16:58                       ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15 16:58 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Mar 15, 2018 at 9:27 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/15/2018 12:21 PM, Sinan Kaya wrote:
>> On 3/15/2018 10:32 AM, Alexander Duyck wrote:
>>> We tend to do something like:
>>>   update tx_buffer_info
>>>   update tx_desc
>>>   wmb()
>>>   point first tx_buffer_info next_to_watch value at last tx_desc
>>>   update next_to_use
>>>   notify device via writel
>>>
>>> We do it this way because we have to synchronize between the Tx
>>> cleanup path and the hardware so we basically lump the two barriers
>>> together. instead of invoking both a smp_wmb and a wmb. Now that I
>>> look at the pseudocode though I wonder if we shouldn't move the
>>> next_to_use update before the wmb, but that might be material for
>>> another patch. Anyway, in the Tx cleanup path we should have an
>>> smp_rmb() after we read the next_to_watch values so that we avoid
>>> reading any of the other fields in the buffer_info if either the field
>>> is NULL or the descriptor pointed to has not been written back.
>>
>> How do you feel about keeping wmb() very close to writel_relaxed() like this?
>>
>>    update tx_buffer_info
>>    update tx_desc
>>    point first tx_buffer_info next_to_watch value at last tx_desc
>>    update next_to_use
>>    wmb()
>>    notify device via writel_relaxed()
>>
>> I'm afraid that if the order of wmb() and writel() is not very
>> obvious or hidden in multiple functions, somebody can introduce a very nasty
>> bug in the future.
>>
>> We also have to think about code maintenance.
>>
>
> Now that I read your email again, I think this is the reason if I understood you
> correctly.
>
> "instead of invoking both a smp_wmb and a wmb"
>
> You'd need something like
>
>     update tx_buffer_info
>     update tx_desc
>     smp_wmb()
>     point first tx_buffer_info next_to_watch value at last tx_desc
>     update next_to_use
>     wmb()
>     notify device via writel_relaxed()
>
> Let me work on your comments.

Yes, we would be doing something like that, but we are using just a
single wmb() to cover both cases since hardware will never look at the
tx_buffer_info and software will never read that descriptor ring as
long as the next_to_watch is NULL. By doing it this way we should have
both cases covered and not need to worry

The only other bit still remaining is the "maybe_stop_tx" logic which
lives between the wmb and writel_relaxed. That logic has a smp_mb
living in it that is triggered if we have to stop the queue. Once
again though that is only viewed by software so it existing between
the wmb and the writel_relaxed should not be an issue.

Starting to understand why I was a bit hesitant to have us start
taking on these changes now? :-)

- Alex

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 3/7] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:20   ` Sinan Kaya
@ 2018-03-15 22:23     ` Jason Gunthorpe
  -1 siblings, 0 replies; 85+ messages in thread
From: Jason Gunthorpe @ 2018-03-15 22:23 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Michal Kalderon, Ariel Elior, Doug Ledford, linux-rdma,
	linux-kernel

On Tue, Mar 13, 2018 at 11:20:24PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> Acked-by: Jason Gunthorpe <jgg@mellanox.com>
>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied to RDMA for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 3/7] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 22:23     ` Jason Gunthorpe
  0 siblings, 0 replies; 85+ messages in thread
From: Jason Gunthorpe @ 2018-03-15 22:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 13, 2018 at 11:20:24PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> Acked-by: Jason Gunthorpe <jgg@mellanox.com>
>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied to RDMA for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
  2018-03-15  1:41     ` Alexander Duyck
  (?)
@ 2018-03-15 23:30       ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15 23:30 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On 3/14/2018 9:41 PM, Alexander Duyck wrote:
>>  }
>>
> So you missed the writel in e1000_xmit_frame. You should probably get
> that one too while you are doing these updates. The wmb() is in
> e1000_tx_queue().
> 

I brought wmb() outside along with the next descriptor assignment to be
similar to the rest of the other code.

if wmb() and writel() are not visible in the same function, let's not touch
the code.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 23:30       ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15 23:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/14/2018 9:41 PM, Alexander Duyck wrote:
>>  }
>>
> So you missed the writel in e1000_xmit_frame. You should probably get
> that one too while you are doing these updates. The wmb() is in
> e1000_tx_queue().
> 

I brought wmb() outside along with the next descriptor assignment to be
similar to the rest of the other code.

if wmb() and writel() are not visible in the same function, let's not touch
the code.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15 23:30       ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-15 23:30 UTC (permalink / raw)
  To: intel-wired-lan

On 3/14/2018 9:41 PM, Alexander Duyck wrote:
>>  }
>>
> So you missed the writel in e1000_xmit_frame. You should probably get
> that one too while you are doing these updates. The wmb() is in
> e1000_tx_queue().
> 

I brought wmb() outside along with the next descriptor assignment to be
similar to the rest of the other code.

if wmb() and writel() are not visible in the same function, let's not touch
the code.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
  2018-03-15 23:30       ` Sinan Kaya
  (?)
@ 2018-03-16  0:25         ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-16  0:25 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Thu, Mar 15, 2018 at 4:30 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/14/2018 9:41 PM, Alexander Duyck wrote:
>>>  }
>>>
>> So you missed the writel in e1000_xmit_frame. You should probably get
>> that one too while you are doing these updates. The wmb() is in
>> e1000_tx_queue().
>>
>
> I brought wmb() outside along with the next descriptor assignment to be
> similar to the rest of the other code.
>
> if wmb() and writel() are not visible in the same function, let's not touch
> the code.

Maybe for e1000 we should just skip the driver entirely. Odds are you
aren't going to have any e1000 parts running on ARM anyway since most
of them are legacy PCI or PCI-X parts that were made over 10 years
ago. Most of your efforts would probably be best spent on igb, igbvf,
ixgbe, ixgbevf, i40e, i40evf, and fm10k.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-16  0:25         ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-16  0:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 15, 2018 at 4:30 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/14/2018 9:41 PM, Alexander Duyck wrote:
>>>  }
>>>
>> So you missed the writel in e1000_xmit_frame. You should probably get
>> that one too while you are doing these updates. The wmb() is in
>> e1000_tx_queue().
>>
>
> I brought wmb() outside along with the next descriptor assignment to be
> similar to the rest of the other code.
>
> if wmb() and writel() are not visible in the same function, let's not touch
> the code.

Maybe for e1000 we should just skip the driver entirely. Odds are you
aren't going to have any e1000 parts running on ARM anyway since most
of them are legacy PCI or PCI-X parts that were made over 10 years
ago. Most of your efforts would probably be best spent on igb, igbvf,
ixgbe, ixgbevf, i40e, i40evf, and fm10k.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-16  0:25         ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-16  0:25 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Mar 15, 2018 at 4:30 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/14/2018 9:41 PM, Alexander Duyck wrote:
>>>  }
>>>
>> So you missed the writel in e1000_xmit_frame. You should probably get
>> that one too while you are doing these updates. The wmb() is in
>> e1000_tx_queue().
>>
>
> I brought wmb() outside along with the next descriptor assignment to be
> similar to the rest of the other code.
>
> if wmb() and writel() are not visible in the same function, let's not touch
> the code.

Maybe for e1000 we should just skip the driver entirely. Odds are you
aren't going to have any e1000 parts running on ARM anyway since most
of them are legacy PCI or PCI-X parts that were made over 10 years
ago. Most of your efforts would probably be best spent on igb, igbvf,
ixgbe, ixgbevf, i40e, i40evf, and fm10k.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
  2018-03-16  0:25         ` Alexander Duyck
  (?)
@ 2018-03-16  0:50           ` Sinan Kaya
  -1 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-16  0:50 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On 3/15/2018 8:25 PM, Alexander Duyck wrote:
> On Thu, Mar 15, 2018 at 4:30 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> On 3/14/2018 9:41 PM, Alexander Duyck wrote:
>>>>  }
>>>>
>>> So you missed the writel in e1000_xmit_frame. You should probably get
>>> that one too while you are doing these updates. The wmb() is in
>>> e1000_tx_queue().
>>>
>>
>> I brought wmb() outside along with the next descriptor assignment to be
>> similar to the rest of the other code.
>>
>> if wmb() and writel() are not visible in the same function, let's not touch
>> the code.
> 
> Maybe for e1000 we should just skip the driver entirely. Odds are you
> aren't going to have any e1000 parts running on ARM anyway since most
> of them are legacy PCI or PCI-X parts that were made over 10 years
> ago. Most of your efforts would probably be best spent on igb, igbvf,
> ixgbe, ixgbevf, i40e, i40evf, and fm10k.
> 

Sure. I'll drop it.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-16  0:50           ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-16  0:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/15/2018 8:25 PM, Alexander Duyck wrote:
> On Thu, Mar 15, 2018 at 4:30 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> On 3/14/2018 9:41 PM, Alexander Duyck wrote:
>>>>  }
>>>>
>>> So you missed the writel in e1000_xmit_frame. You should probably get
>>> that one too while you are doing these updates. The wmb() is in
>>> e1000_tx_queue().
>>>
>>
>> I brought wmb() outside along with the next descriptor assignment to be
>> similar to the rest of the other code.
>>
>> if wmb() and writel() are not visible in the same function, let's not touch
>> the code.
> 
> Maybe for e1000 we should just skip the driver entirely. Odds are you
> aren't going to have any e1000 parts running on ARM anyway since most
> of them are legacy PCI or PCI-X parts that were made over 10 years
> ago. Most of your efforts would probably be best spent on igb, igbvf,
> ixgbe, ixgbevf, i40e, i40evf, and fm10k.
> 

Sure. I'll drop it.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [Intel-wired-lan] [PATCH 6/7] e1000: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-16  0:50           ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-16  0:50 UTC (permalink / raw)
  To: intel-wired-lan

On 3/15/2018 8:25 PM, Alexander Duyck wrote:
> On Thu, Mar 15, 2018 at 4:30 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> On 3/14/2018 9:41 PM, Alexander Duyck wrote:
>>>>  }
>>>>
>>> So you missed the writel in e1000_xmit_frame. You should probably get
>>> that one too while you are doing these updates. The wmb() is in
>>> e1000_tx_queue().
>>>
>>
>> I brought wmb() outside along with the next descriptor assignment to be
>> similar to the rest of the other code.
>>
>> if wmb() and writel() are not visible in the same function, let's not touch
>> the code.
> 
> Maybe for e1000 we should just skip the driver entirely. Odds are you
> aren't going to have any e1000 parts running on ARM anyway since most
> of them are legacy PCI or PCI-X parts that were made over 10 years
> ago. Most of your efforts would probably be best spent on igb, igbvf,
> ixgbe, ixgbevf, i40e, i40evf, and fm10k.
> 

Sure. I'll drop it.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-14  3:19 ` Sinan Kaya
@ 2018-03-15  1:53   ` Alexander Duyck
  -1 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:53 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel,
	Jeff Kirsher, intel-wired-lan, LKML

On Tue, Mar 13, 2018 at 8:19 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 2 +-
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index e554aa6cf..7028516 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
>          * such as IA-64).
>          */
>         wmb();
> -       writel(val, rx_ring->tail);
> +       writel_relaxed(val, rx_ring->tail);
>  }
>
>  /**
> diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> index 357d605..2d323fc 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> @@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
>          * such as IA-64).
>          */
>         wmb();
> -       writel(val, rx_ring->tail);
> +       writel_relaxed(val, rx_ring->tail);
>  }
>

So this one missed writel calls in:
i40e:
  i40e_program_fdir_filter
  i40e_clean_rx_irq
  i40e_tx_map
i40evf:
  i40e_clean_rx_irq
  i40e_tx_map

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-15  1:53   ` Alexander Duyck
  0 siblings, 0 replies; 85+ messages in thread
From: Alexander Duyck @ 2018-03-15  1:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 13, 2018 at 8:19 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 2 +-
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index e554aa6cf..7028516 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
>          * such as IA-64).
>          */
>         wmb();
> -       writel(val, rx_ring->tail);
> +       writel_relaxed(val, rx_ring->tail);
>  }
>
>  /**
> diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> index 357d605..2d323fc 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> @@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
>          * such as IA-64).
>          */
>         wmb();
> -       writel(val, rx_ring->tail);
> +       writel_relaxed(val, rx_ring->tail);
>  }
>

So this one missed writel calls in:
i40e:
  i40e_program_fdir_filter
  i40e_clean_rx_irq
  i40e_tx_map
i40evf:
  i40e_clean_rx_irq
  i40e_tx_map

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:19 ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:19 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Jeff Kirsher,
	intel-wired-lan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 2 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e554aa6cf..7028516 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 357d605..2d323fc 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-14  3:19 ` Sinan Kaya
  0 siblings, 0 replies; 85+ messages in thread
From: Sinan Kaya @ 2018-03-14  3:19 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 2 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e554aa6cf..7028516 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 357d605..2d323fc 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2018-03-16  0:50 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-14  3:20 [PATCH 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
2018-03-14  3:20 ` [Intel-wired-lan] " Sinan Kaya
2018-03-14  3:20 ` Sinan Kaya
2018-03-14  3:20 ` Sinan Kaya
2018-03-14  3:20 ` [PATCH 2/7] ixgbe: eliminate " Sinan Kaya
2018-03-14  3:20   ` [Intel-wired-lan] " Sinan Kaya
2018-03-14  3:20   ` Sinan Kaya
2018-03-15  1:47   ` Alexander Duyck
2018-03-15  1:47     ` [Intel-wired-lan] " Alexander Duyck
2018-03-15  1:47     ` Alexander Duyck
2018-03-14  3:20 ` [PATCH 3/7] RDMA/qedr: " Sinan Kaya
2018-03-14  3:20   ` Sinan Kaya
2018-03-14  4:12   ` Jason Gunthorpe
2018-03-14  4:12     ` Jason Gunthorpe
2018-03-14 12:06     ` okaya
2018-03-14 12:06       ` okaya at codeaurora.org
2018-03-15 22:23   ` Jason Gunthorpe
2018-03-15 22:23     ` Jason Gunthorpe
2018-03-14  3:20 ` [PATCH 4/7] igbvf: " Sinan Kaya
2018-03-14  3:20   ` [Intel-wired-lan] " Sinan Kaya
2018-03-14  3:20   ` Sinan Kaya
2018-03-15  1:48   ` Alexander Duyck
2018-03-15  1:48     ` [Intel-wired-lan] " Alexander Duyck
2018-03-15  1:48     ` Alexander Duyck
2018-03-14  3:20 ` [PATCH 5/7] igb: " Sinan Kaya
2018-03-14  3:20   ` [Intel-wired-lan] " Sinan Kaya
2018-03-14  3:20   ` Sinan Kaya
2018-03-15  1:50   ` Alexander Duyck
2018-03-15  1:50     ` [Intel-wired-lan] " Alexander Duyck
2018-03-15  1:50     ` Alexander Duyck
2018-03-14  3:20 ` [PATCH 6/7] e1000: " Sinan Kaya
2018-03-14  3:20   ` [Intel-wired-lan] " Sinan Kaya
2018-03-14  3:20   ` Sinan Kaya
2018-03-15  1:41   ` Alexander Duyck
2018-03-15  1:41     ` [Intel-wired-lan] " Alexander Duyck
2018-03-15  1:41     ` Alexander Duyck
2018-03-15 23:30     ` Sinan Kaya
2018-03-15 23:30       ` [Intel-wired-lan] " Sinan Kaya
2018-03-15 23:30       ` Sinan Kaya
2018-03-16  0:25       ` Alexander Duyck
2018-03-16  0:25         ` [Intel-wired-lan] " Alexander Duyck
2018-03-16  0:25         ` Alexander Duyck
2018-03-16  0:50         ` Sinan Kaya
2018-03-16  0:50           ` [Intel-wired-lan] " Sinan Kaya
2018-03-16  0:50           ` Sinan Kaya
2018-03-14  3:20 ` [PATCH 7/7] ixgbevf: " Sinan Kaya
2018-03-14  3:20   ` [Intel-wired-lan] " Sinan Kaya
2018-03-14  3:20   ` Sinan Kaya
2018-03-14  5:08   ` Timur Tabi
2018-03-14  5:08     ` [Intel-wired-lan] " Timur Tabi
2018-03-14  5:08     ` Timur Tabi
2018-03-14 12:13     ` okaya
2018-03-14 12:13       ` [Intel-wired-lan] " okaya
2018-03-14 12:13       ` okaya at codeaurora.org
2018-03-14 21:49       ` Alexander Duyck
2018-03-14 21:49         ` [Intel-wired-lan] " Alexander Duyck
2018-03-14 21:49         ` Alexander Duyck
2018-03-14 22:57         ` Sinan Kaya
2018-03-14 22:57           ` [Intel-wired-lan] " Sinan Kaya
2018-03-14 22:57           ` Sinan Kaya
2018-03-15  1:44           ` Alexander Duyck
2018-03-15  1:44             ` [Intel-wired-lan] " Alexander Duyck
2018-03-15  1:44             ` Alexander Duyck
2018-03-15  2:17             ` Sinan Kaya
2018-03-15  2:17               ` [Intel-wired-lan] " Sinan Kaya
2018-03-15  2:17               ` Sinan Kaya
2018-03-15 14:32               ` Alexander Duyck
2018-03-15 14:32                 ` [Intel-wired-lan] " Alexander Duyck
2018-03-15 14:32                 ` Alexander Duyck
2018-03-15 16:21                 ` Sinan Kaya
2018-03-15 16:21                   ` [Intel-wired-lan] " Sinan Kaya
2018-03-15 16:21                   ` Sinan Kaya
2018-03-15 16:27                   ` Sinan Kaya
2018-03-15 16:27                     ` [Intel-wired-lan] " Sinan Kaya
2018-03-15 16:27                     ` Sinan Kaya
2018-03-15 16:58                     ` Alexander Duyck
2018-03-15 16:58                       ` [Intel-wired-lan] " Alexander Duyck
2018-03-15 16:58                       ` Alexander Duyck
2018-03-15  1:45   ` Alexander Duyck
2018-03-15  1:45     ` [Intel-wired-lan] " Alexander Duyck
2018-03-15  1:45     ` Alexander Duyck
  -- strict thread matches above, loose matches on Subject: below --
2018-03-14  3:19 [PATCH 1/7] i40e/i40evf: Eliminate " Sinan Kaya
2018-03-14  3:19 ` Sinan Kaya
2018-03-15  1:53 ` Alexander Duyck
2018-03-15  1:53   ` Alexander Duyck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.