All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] ib: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20  2:47 ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-rdma, timur, sulrich; +Cc: Sinan Kaya, linux-arm-msm, linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

I did a regex search for wmb() followed by writel() in each drivers
directory.
I scrubbed the ones I care about in this series.

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.

We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

Feel free to apply patches individually.

Changes since v3:
- https://www.spinics.net/lists/arm-kernel/msg641851.html
- group patches together into subsystems ib:...
- collect reviewed and tested bys
- scrub barrier()

Sinan Kaya (6):
  RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
  RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
  infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered
    archs
  IB/nes: Eliminate duplicate barriers on weakly-ordered archs
  RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2

 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |  8 ++++----
 drivers/infiniband/hw/cxgb4/t4.h           | 14 +++++++-------
 drivers/infiniband/hw/i40iw/i40iw_ctrl.c   |  6 ++++--
 drivers/infiniband/hw/i40iw/i40iw_osdep.h  |  1 +
 drivers/infiniband/hw/i40iw/i40iw_uk.c     |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_utils.c  | 11 +++++++++++
 drivers/infiniband/hw/mlx4/qp.c            |  4 ++--
 drivers/infiniband/hw/nes/nes.h            |  5 +++++
 drivers/infiniband/hw/nes/nes_hw.c         | 21 ++++++++++++++-------
 drivers/infiniband/hw/nes/nes_mgt.c        | 15 ++++++++++-----
 drivers/infiniband/hw/nes/nes_nic.c        |  2 +-
 drivers/infiniband/hw/nes/nes_utils.c      |  3 ++-
 drivers/infiniband/hw/nes/nes_verbs.c      |  5 +++--
 drivers/infiniband/hw/qedr/verbs.c         |  4 ++--
 14 files changed, 67 insertions(+), 34 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 0/6] ib: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20  2:47 ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

I did a regex search for wmb() followed by writel() in each drivers
directory.
I scrubbed the ones I care about in this series.

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.

We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

Feel free to apply patches individually.

Changes since v3:
- https://www.spinics.net/lists/arm-kernel/msg641851.html
- group patches together into subsystems ib:...
- collect reviewed and tested bys
- scrub barrier()

Sinan Kaya (6):
  RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
  RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
  infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered
    archs
  IB/nes: Eliminate duplicate barriers on weakly-ordered archs
  RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2

 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c |  8 ++++----
 drivers/infiniband/hw/cxgb4/t4.h           | 14 +++++++-------
 drivers/infiniband/hw/i40iw/i40iw_ctrl.c   |  6 ++++--
 drivers/infiniband/hw/i40iw/i40iw_osdep.h  |  1 +
 drivers/infiniband/hw/i40iw/i40iw_uk.c     |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_utils.c  | 11 +++++++++++
 drivers/infiniband/hw/mlx4/qp.c            |  4 ++--
 drivers/infiniband/hw/nes/nes.h            |  5 +++++
 drivers/infiniband/hw/nes/nes_hw.c         | 21 ++++++++++++++-------
 drivers/infiniband/hw/nes/nes_mgt.c        | 15 ++++++++++-----
 drivers/infiniband/hw/nes/nes_nic.c        |  2 +-
 drivers/infiniband/hw/nes/nes_utils.c      |  3 ++-
 drivers/infiniband/hw/nes/nes_verbs.c      |  5 +++--
 drivers/infiniband/hw/qedr/verbs.c         |  4 ++--
 14 files changed, 67 insertions(+), 34 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47 ` Sinan Kaya
@ 2018-03-20  2:47   ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-rdma, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Selvin Xavier,
	Devesh Sharma, Somnath Kotur, Sriharsha Basavapatna,
	Doug Ledford, Jason Gunthorpe, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
index 8329ec6..4a6b981 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
@@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
 
 	/* ring CMDQ DB */
 	wmb();
-	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
-	       rcfw->cmdq_bar_reg_prod_off);
-	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
-	       rcfw->cmdq_bar_reg_trig_off);
+	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
+		       rcfw->cmdq_bar_reg_prod_off);
+	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
+		       rcfw->cmdq_bar_reg_trig_off);
 done:
 	spin_unlock_irqrestore(&cmdq->lock, flags);
 	/* Return the CREQ response pointer */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20  2:47   ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
index 8329ec6..4a6b981 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
@@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
 
 	/* ring CMDQ DB */
 	wmb();
-	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
-	       rcfw->cmdq_bar_reg_prod_off);
-	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
-	       rcfw->cmdq_bar_reg_trig_off);
+	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
+		       rcfw->cmdq_bar_reg_prod_off);
+	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
+		       rcfw->cmdq_bar_reg_trig_off);
 done:
 	spin_unlock_irqrestore(&cmdq->lock, flags);
 	/* Return the CREQ response pointer */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 2/6] IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47 ` Sinan Kaya
@ 2018-03-20  2:47   ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-rdma, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Yishai Hadas,
	Doug Ledford, Jason Gunthorpe, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/mlx4/qp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index f045491..74b27b0 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -3880,8 +3880,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		 */
 		wmb();
 
-		writel(qp->doorbell_qpn,
-		       to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
+		writel_relaxed(qp->doorbell_qpn,
+			to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
 
 		/*
 		 * Make sure doorbells don't leak out of SQ spinlock
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 2/6] IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20  2:47   ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/mlx4/qp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index f045491..74b27b0 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -3880,8 +3880,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		 */
 		wmb();
 
-		writel(qp->doorbell_qpn,
-		       to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
+		writel_relaxed(qp->doorbell_qpn,
+			to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
 
 		/*
 		 * Make sure doorbells don't leak out of SQ spinlock
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47 ` Sinan Kaya
@ 2018-03-20  2:47   ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-rdma, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Faisal Latif,
	Shiraz Saleem, Doug Ledford, Jason Gunthorpe, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/i40iw/i40iw_ctrl.c  |  6 ++++--
 drivers/infiniband/hw/i40iw/i40iw_osdep.h |  1 +
 drivers/infiniband/hw/i40iw/i40iw_uk.c    |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_utils.c | 11 +++++++++++
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
index c74fd33..47f473e 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
@@ -706,9 +706,11 @@ static void i40iw_sc_ccq_arm(struct i40iw_sc_cq *ccq)
 	wmb();       /* make sure shadow area is updated before arming */
 
 	if (ccq->dev->is_pf)
-		i40iw_wr32(ccq->dev->hw, I40E_PFPE_CQARM, ccq->cq_uk.cq_id);
+		i40iw_wr32_relaxed(ccq->dev->hw, I40E_PFPE_CQARM,
+				   ccq->cq_uk.cq_id);
 	else
-		i40iw_wr32(ccq->dev->hw, I40E_VFPE_CQARM1, ccq->cq_uk.cq_id);
+		i40iw_wr32_relaxed(ccq->dev->hw, I40E_VFPE_CQARM1,
+				   ccq->cq_uk.cq_id);
 }
 
 /**
diff --git a/drivers/infiniband/hw/i40iw/i40iw_osdep.h b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
index f27be3e..e06f4b9 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_osdep.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
@@ -213,5 +213,6 @@ void i40iw_hw_stats_start_timer(struct i40iw_sc_vsi *vsi);
 void i40iw_hw_stats_stop_timer(struct i40iw_sc_vsi *vsi);
 #define i40iw_mmiowb() mmiowb()
 void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value);
+void i40iw_wr32_relaxed(struct i40iw_hw *hw, u32 reg, u32 value);
 u32  i40iw_rd32(struct i40iw_hw *hw, u32 reg);
 #endif				/* _I40IW_OSDEP_H_ */
diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
index 8afa5a6..7f0ebed 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_uk.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
@@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
 
 	wmb(); /* make sure WQE is populated before valid bit is set */
 
-	writel(cq->cq_id, cq->cqe_alloc_reg);
+	writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
 }
 
 /**
diff --git a/drivers/infiniband/hw/i40iw/i40iw_utils.c b/drivers/infiniband/hw/i40iw/i40iw_utils.c
index ddc1056..99aa6f8 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_utils.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_utils.c
@@ -125,6 +125,17 @@ inline void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value)
 }
 
 /**
+ * i40iw_wr32_relaxed - write 32 bits to hw register without ordering
+ * @hw: hardware information including registers
+ * @reg: register offset
+ * @value: vvalue to write to register
+ */
+inline void i40iw_wr32_relaxed(struct i40iw_hw *hw, u32 reg, u32 value)
+{
+	writel_relaxed(value, hw->hw_addr + reg);
+}
+
+/**
  * i40iw_rd32 - read a 32 bit hw register
  * @hw: hardware information including registers
  * @reg: register offset
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20  2:47   ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/i40iw/i40iw_ctrl.c  |  6 ++++--
 drivers/infiniband/hw/i40iw/i40iw_osdep.h |  1 +
 drivers/infiniband/hw/i40iw/i40iw_uk.c    |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_utils.c | 11 +++++++++++
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
index c74fd33..47f473e 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
@@ -706,9 +706,11 @@ static void i40iw_sc_ccq_arm(struct i40iw_sc_cq *ccq)
 	wmb();       /* make sure shadow area is updated before arming */
 
 	if (ccq->dev->is_pf)
-		i40iw_wr32(ccq->dev->hw, I40E_PFPE_CQARM, ccq->cq_uk.cq_id);
+		i40iw_wr32_relaxed(ccq->dev->hw, I40E_PFPE_CQARM,
+				   ccq->cq_uk.cq_id);
 	else
-		i40iw_wr32(ccq->dev->hw, I40E_VFPE_CQARM1, ccq->cq_uk.cq_id);
+		i40iw_wr32_relaxed(ccq->dev->hw, I40E_VFPE_CQARM1,
+				   ccq->cq_uk.cq_id);
 }
 
 /**
diff --git a/drivers/infiniband/hw/i40iw/i40iw_osdep.h b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
index f27be3e..e06f4b9 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_osdep.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
@@ -213,5 +213,6 @@ void i40iw_hw_stats_start_timer(struct i40iw_sc_vsi *vsi);
 void i40iw_hw_stats_stop_timer(struct i40iw_sc_vsi *vsi);
 #define i40iw_mmiowb() mmiowb()
 void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value);
+void i40iw_wr32_relaxed(struct i40iw_hw *hw, u32 reg, u32 value);
 u32  i40iw_rd32(struct i40iw_hw *hw, u32 reg);
 #endif				/* _I40IW_OSDEP_H_ */
diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
index 8afa5a6..7f0ebed 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_uk.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
@@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
 
 	wmb(); /* make sure WQE is populated before valid bit is set */
 
-	writel(cq->cq_id, cq->cqe_alloc_reg);
+	writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
 }
 
 /**
diff --git a/drivers/infiniband/hw/i40iw/i40iw_utils.c b/drivers/infiniband/hw/i40iw/i40iw_utils.c
index ddc1056..99aa6f8 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_utils.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_utils.c
@@ -125,6 +125,17 @@ inline void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value)
 }
 
 /**
+ * i40iw_wr32_relaxed - write 32 bits to hw register without ordering
+ * @hw: hardware information including registers
+ * @reg: register offset
+ * @value: vvalue to write to register
+ */
+inline void i40iw_wr32_relaxed(struct i40iw_hw *hw, u32 reg, u32 value)
+{
+	writel_relaxed(value, hw->hw_addr + reg);
+}
+
+/**
  * i40iw_rd32 - read a 32 bit hw register
  * @hw: hardware information including registers
  * @reg: register offset
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47 ` Sinan Kaya
@ 2018-03-20  2:47   ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-rdma, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Steve Wise,
	Doug Ledford, Jason Gunthorpe, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
index 8369c7c..6e5658a 100644
--- a/drivers/infiniband/hw/cxgb4/t4.h
+++ b/drivers/infiniband/hw/cxgb4/t4.h
@@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst, u64 *src)
 	int count = 8;
 
 	while (count) {
-		writeq(*src, dst);
+		writeq_relaxed(*src, dst);
 		src++;
 		dst++;
 		count--;
@@ -477,15 +477,15 @@ static inline void t4_ring_sq_db(struct t4_wq *wq, u16 inc, union t4_wr *wqe)
 				 (u64 *)wqe);
 		} else {
 			pr_debug("DB wq->sq.pidx = %d\n", wq->sq.pidx);
-			writel(PIDX_T5_V(inc) | QID_V(wq->sq.bar2_qid),
-			       wq->sq.bar2_va + SGE_UDB_KDOORBELL);
+			writel_relaxed(PIDX_T5_V(inc) | QID_V(wq->sq.bar2_qid),
+				       wq->sq.bar2_va + SGE_UDB_KDOORBELL);
 		}
 
 		/* Flush user doorbell area writes. */
 		wmb();
 		return;
 	}
-	writel(QID_V(wq->sq.qid) | PIDX_V(inc), wq->db);
+	writel_relaxed(QID_V(wq->sq.qid) | PIDX_V(inc), wq->db);
 }
 
 static inline void t4_ring_rq_db(struct t4_wq *wq, u16 inc,
@@ -502,15 +502,15 @@ static inline void t4_ring_rq_db(struct t4_wq *wq, u16 inc,
 				 (void *)wqe);
 		} else {
 			pr_debug("DB wq->rq.pidx = %d\n", wq->rq.pidx);
-			writel(PIDX_T5_V(inc) | QID_V(wq->rq.bar2_qid),
-			       wq->rq.bar2_va + SGE_UDB_KDOORBELL);
+			writel_relaxed(PIDX_T5_V(inc) | QID_V(wq->rq.bar2_qid),
+				       wq->rq.bar2_va + SGE_UDB_KDOORBELL);
 		}
 
 		/* Flush user doorbell area writes. */
 		wmb();
 		return;
 	}
-	writel(QID_V(wq->rq.qid) | PIDX_V(inc), wq->db);
+	writel_relaxed(QID_V(wq->rq.qid) | PIDX_V(inc), wq->db);
 }
 
 static inline int t4_wq_in_error(struct t4_wq *wq)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20  2:47   ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
index 8369c7c..6e5658a 100644
--- a/drivers/infiniband/hw/cxgb4/t4.h
+++ b/drivers/infiniband/hw/cxgb4/t4.h
@@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst, u64 *src)
 	int count = 8;
 
 	while (count) {
-		writeq(*src, dst);
+		writeq_relaxed(*src, dst);
 		src++;
 		dst++;
 		count--;
@@ -477,15 +477,15 @@ static inline void t4_ring_sq_db(struct t4_wq *wq, u16 inc, union t4_wr *wqe)
 				 (u64 *)wqe);
 		} else {
 			pr_debug("DB wq->sq.pidx = %d\n", wq->sq.pidx);
-			writel(PIDX_T5_V(inc) | QID_V(wq->sq.bar2_qid),
-			       wq->sq.bar2_va + SGE_UDB_KDOORBELL);
+			writel_relaxed(PIDX_T5_V(inc) | QID_V(wq->sq.bar2_qid),
+				       wq->sq.bar2_va + SGE_UDB_KDOORBELL);
 		}
 
 		/* Flush user doorbell area writes. */
 		wmb();
 		return;
 	}
-	writel(QID_V(wq->sq.qid) | PIDX_V(inc), wq->db);
+	writel_relaxed(QID_V(wq->sq.qid) | PIDX_V(inc), wq->db);
 }
 
 static inline void t4_ring_rq_db(struct t4_wq *wq, u16 inc,
@@ -502,15 +502,15 @@ static inline void t4_ring_rq_db(struct t4_wq *wq, u16 inc,
 				 (void *)wqe);
 		} else {
 			pr_debug("DB wq->rq.pidx = %d\n", wq->rq.pidx);
-			writel(PIDX_T5_V(inc) | QID_V(wq->rq.bar2_qid),
-			       wq->rq.bar2_va + SGE_UDB_KDOORBELL);
+			writel_relaxed(PIDX_T5_V(inc) | QID_V(wq->rq.bar2_qid),
+				       wq->rq.bar2_va + SGE_UDB_KDOORBELL);
 		}
 
 		/* Flush user doorbell area writes. */
 		wmb();
 		return;
 	}
-	writel(QID_V(wq->rq.qid) | PIDX_V(inc), wq->db);
+	writel_relaxed(QID_V(wq->rq.qid) | PIDX_V(inc), wq->db);
 }
 
 static inline int t4_wq_in_error(struct t4_wq *wq)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47 ` Sinan Kaya
@ 2018-03-20  2:47   ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-rdma, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Faisal Latif,
	Doug Ledford, Jason Gunthorpe, linux-kernel

Code includes barrier() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a barrier().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/nes/nes.h       |  5 +++++
 drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
 drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
 drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
 drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
 drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
 6 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
index 00c27291..85e007d 100644
--- a/drivers/infiniband/hw/nes/nes.h
+++ b/drivers/infiniband/hw/nes/nes.h
@@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
 	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
 }
 
+static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
+{
+	writel_relaxed(val, addr);
+}
+
 static inline void nes_write32(void __iomem *addr, u32 val)
 {
 	writel(val, addr);
diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index 18a7de1..568e17d 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
 
 	barrier();
 	/* Ring doorbell (5 WQEs) */
-	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
+	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+			    0x05800000 | nesdev->cqp.qp_id);
 
 	spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 
@@ -1594,7 +1595,8 @@ static void nes_replenish_nic_rq(struct nes_vnic *nesvnic)
 			atomic_dec(&nesvnic->rx_skbs_needed);
 			barrier();
 			if (++rx_wqes_posted == 255) {
-				nes_write32(nesdev->regs+NES_WQE_ALLOC, (rx_wqes_posted << 24) | nesnic->qp_id);
+				nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				       (rx_wqes_posted << 24) | nesnic->qp_id);
 				rx_wqes_posted = 0;
 			}
 		} else {
@@ -1612,7 +1614,8 @@ static void nes_replenish_nic_rq(struct nes_vnic *nesvnic)
 	} while (atomic_read(&nesvnic->rx_skbs_needed));
 	barrier();
 	if (rx_wqes_posted)
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, (rx_wqes_posted << 24) | nesnic->qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    (rx_wqes_posted << 24) | nesnic->qp_id);
 	nesnic->replenishing_rq = 0;
 }
 
@@ -1795,7 +1798,8 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev)
 	barrier();
 
 	/* Ring doorbell (2 WQEs) */
-	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id);
+	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+			    0x02800000 | nesdev->cqp.qp_id);
 
 	spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 	nes_debug(NES_DBG_INIT, "Waiting for create NIC QP%u to complete.\n",
@@ -1844,7 +1848,8 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev)
 	do {
 		counter = min(wqe_count, ((u32)255));
 		wqe_count -= counter;
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, (counter << 24) | nesvnic->nic.qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    (counter << 24) | nesvnic->nic.qp_id);
 	} while (wqe_count);
 	timer_setup(&nesvnic->rq_wqes_timer, nes_rq_wqes_timeout, 0);
 	nes_debug(NES_DBG_INIT, "NAPI support Enabled\n");
@@ -1988,7 +1993,8 @@ void nes_destroy_nic_qp(struct nes_vnic *nesvnic)
 	barrier();
 
 	/* Ring doorbell (2 WQEs) */
-	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id);
+	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+			    0x02800000 | nesdev->cqp.qp_id);
 
 	spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 	nes_debug(NES_DBG_SHUTDOWN, "Waiting for CQP, cqp_head=%u, cqp.sq_head=%u,"
@@ -3064,7 +3070,8 @@ static void nes_cqp_ce_handler(struct nes_device *nesdev, struct nes_hw_cq *cq)
 				cqp_request, le32_to_cpu(cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX])&0x3f, head);
 		/* Ring doorbell (1 WQEs) */
 		barrier();
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    0x01800000 | nesdev->cqp.qp_id);
 	}
 	spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 
diff --git a/drivers/infiniband/hw/nes/nes_mgt.c b/drivers/infiniband/hw/nes/nes_mgt.c
index 21e0ebd..5c5073c 100644
--- a/drivers/infiniband/hw/nes/nes_mgt.c
+++ b/drivers/infiniband/hw/nes/nes_mgt.c
@@ -96,7 +96,8 @@ static void nes_replenish_mgt_rq(struct nes_vnic_mgt *mgtvnic)
 			atomic_dec(&mgtvnic->rx_skbs_needed);
 			barrier();
 			if (++rx_wqes_posted == 255) {
-				nes_write32(nesdev->regs + NES_WQE_ALLOC, (rx_wqes_posted << 24) | nesmgt->qp_id);
+				nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+					(rx_wqes_posted << 24) | nesmgt->qp_id);
 				rx_wqes_posted = 0;
 			}
 		} else {
@@ -115,7 +116,8 @@ static void nes_replenish_mgt_rq(struct nes_vnic_mgt *mgtvnic)
 	} while (atomic_read(&mgtvnic->rx_skbs_needed));
 	barrier();
 	if (rx_wqes_posted)
-		nes_write32(nesdev->regs + NES_WQE_ALLOC, (rx_wqes_posted << 24) | nesmgt->qp_id);
+		nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+				    (rx_wqes_posted << 24) | nesmgt->qp_id);
 	nesmgt->replenishing_rq = 0;
 }
 
@@ -995,7 +997,8 @@ int nes_init_mgt_qp(struct nes_device *nesdev, struct net_device *netdev, struct
 		barrier();
 
 		/* Ring doorbell (2 WQEs) */
-		nes_write32(nesdev->regs + NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id);
+		nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+				    0x02800000 | nesdev->cqp.qp_id);
 
 		spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 		nes_debug(NES_DBG_INIT, "Waiting for create MGT QP%u to complete.\n",
@@ -1050,7 +1053,8 @@ int nes_init_mgt_qp(struct nes_device *nesdev, struct net_device *netdev, struct
 		do {
 			counter = min(wqe_count, ((u32)255));
 			wqe_count -= counter;
-			nes_write32(nesdev->regs + NES_WQE_ALLOC, (counter << 24) | mgtvnic->mgt.qp_id);
+			nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+					(counter << 24) | mgtvnic->mgt.qp_id);
 		} while (wqe_count);
 
 		nes_write32(nesdev->regs + NES_CQE_ALLOC, NES_CQE_ALLOC_NOTIFY_NEXT |
@@ -1124,7 +1128,8 @@ void nes_destroy_mgt(struct nes_vnic *nesvnic)
 		barrier();
 
 		/* Ring doorbell (2 WQEs) */
-		nes_write32(nesdev->regs + NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id);
+		nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+				    0x02800000 | nesdev->cqp.qp_id);
 
 		spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 		nes_debug(NES_DBG_SHUTDOWN, "Waiting for CQP, cqp_head=%u, cqp.sq_head=%u,"
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index 0a75164..0653da0 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -683,7 +683,7 @@ static int nes_netdev_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	barrier();
 
 	if (wqe_count)
-		nes_write32(nesdev->regs+NES_WQE_ALLOC,
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
 				(wqe_count << 24) | (1 << 23) | nesvnic->nic.qp_id);
 
 	netif_trans_update(netdev);
diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c
index 21b4a83..79a3d98 100644
--- a/drivers/infiniband/hw/nes/nes_utils.c
+++ b/drivers/infiniband/hw/nes/nes_utils.c
@@ -661,7 +661,8 @@ void nes_post_cqp_request(struct nes_device *nesdev,
 		barrier();
 
 		/* Ring doorbell (1 WQEs) */
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    0x01800000 | nesdev->cqp.qp_id);
 
 		barrier();
 	} else {
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 162475a..87f8635 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -3310,7 +3310,7 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr,
 	while (wqe_count) {
 		counter = min(wqe_count, ((u32)255));
 		wqe_count -= counter;
-		nes_write32(nesdev->regs + NES_WQE_ALLOC,
+		nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
 				(counter << 24) | 0x00800000 | nesqp->hwqp.qp_id);
 	}
 
@@ -3404,7 +3404,8 @@ static int nes_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *ib_wr,
 	while (wqe_count) {
 		counter = min(wqe_count, ((u32)255));
 		wqe_count -= counter;
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, (counter<<24) | nesqp->hwqp.qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    (counter<<24) | nesqp->hwqp.qp_id);
 	}
 
 	spin_unlock_irqrestore(&nesqp->lock, flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20  2:47   ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes barrier() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a barrier().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/nes/nes.h       |  5 +++++
 drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
 drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
 drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
 drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
 drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
 6 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
index 00c27291..85e007d 100644
--- a/drivers/infiniband/hw/nes/nes.h
+++ b/drivers/infiniband/hw/nes/nes.h
@@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
 	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
 }
 
+static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
+{
+	writel_relaxed(val, addr);
+}
+
 static inline void nes_write32(void __iomem *addr, u32 val)
 {
 	writel(val, addr);
diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index 18a7de1..568e17d 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
 
 	barrier();
 	/* Ring doorbell (5 WQEs) */
-	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
+	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+			    0x05800000 | nesdev->cqp.qp_id);
 
 	spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 
@@ -1594,7 +1595,8 @@ static void nes_replenish_nic_rq(struct nes_vnic *nesvnic)
 			atomic_dec(&nesvnic->rx_skbs_needed);
 			barrier();
 			if (++rx_wqes_posted == 255) {
-				nes_write32(nesdev->regs+NES_WQE_ALLOC, (rx_wqes_posted << 24) | nesnic->qp_id);
+				nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				       (rx_wqes_posted << 24) | nesnic->qp_id);
 				rx_wqes_posted = 0;
 			}
 		} else {
@@ -1612,7 +1614,8 @@ static void nes_replenish_nic_rq(struct nes_vnic *nesvnic)
 	} while (atomic_read(&nesvnic->rx_skbs_needed));
 	barrier();
 	if (rx_wqes_posted)
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, (rx_wqes_posted << 24) | nesnic->qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    (rx_wqes_posted << 24) | nesnic->qp_id);
 	nesnic->replenishing_rq = 0;
 }
 
@@ -1795,7 +1798,8 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev)
 	barrier();
 
 	/* Ring doorbell (2 WQEs) */
-	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id);
+	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+			    0x02800000 | nesdev->cqp.qp_id);
 
 	spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 	nes_debug(NES_DBG_INIT, "Waiting for create NIC QP%u to complete.\n",
@@ -1844,7 +1848,8 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct net_device *netdev)
 	do {
 		counter = min(wqe_count, ((u32)255));
 		wqe_count -= counter;
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, (counter << 24) | nesvnic->nic.qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    (counter << 24) | nesvnic->nic.qp_id);
 	} while (wqe_count);
 	timer_setup(&nesvnic->rq_wqes_timer, nes_rq_wqes_timeout, 0);
 	nes_debug(NES_DBG_INIT, "NAPI support Enabled\n");
@@ -1988,7 +1993,8 @@ void nes_destroy_nic_qp(struct nes_vnic *nesvnic)
 	barrier();
 
 	/* Ring doorbell (2 WQEs) */
-	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id);
+	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+			    0x02800000 | nesdev->cqp.qp_id);
 
 	spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 	nes_debug(NES_DBG_SHUTDOWN, "Waiting for CQP, cqp_head=%u, cqp.sq_head=%u,"
@@ -3064,7 +3070,8 @@ static void nes_cqp_ce_handler(struct nes_device *nesdev, struct nes_hw_cq *cq)
 				cqp_request, le32_to_cpu(cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX])&0x3f, head);
 		/* Ring doorbell (1 WQEs) */
 		barrier();
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    0x01800000 | nesdev->cqp.qp_id);
 	}
 	spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 
diff --git a/drivers/infiniband/hw/nes/nes_mgt.c b/drivers/infiniband/hw/nes/nes_mgt.c
index 21e0ebd..5c5073c 100644
--- a/drivers/infiniband/hw/nes/nes_mgt.c
+++ b/drivers/infiniband/hw/nes/nes_mgt.c
@@ -96,7 +96,8 @@ static void nes_replenish_mgt_rq(struct nes_vnic_mgt *mgtvnic)
 			atomic_dec(&mgtvnic->rx_skbs_needed);
 			barrier();
 			if (++rx_wqes_posted == 255) {
-				nes_write32(nesdev->regs + NES_WQE_ALLOC, (rx_wqes_posted << 24) | nesmgt->qp_id);
+				nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+					(rx_wqes_posted << 24) | nesmgt->qp_id);
 				rx_wqes_posted = 0;
 			}
 		} else {
@@ -115,7 +116,8 @@ static void nes_replenish_mgt_rq(struct nes_vnic_mgt *mgtvnic)
 	} while (atomic_read(&mgtvnic->rx_skbs_needed));
 	barrier();
 	if (rx_wqes_posted)
-		nes_write32(nesdev->regs + NES_WQE_ALLOC, (rx_wqes_posted << 24) | nesmgt->qp_id);
+		nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+				    (rx_wqes_posted << 24) | nesmgt->qp_id);
 	nesmgt->replenishing_rq = 0;
 }
 
@@ -995,7 +997,8 @@ int nes_init_mgt_qp(struct nes_device *nesdev, struct net_device *netdev, struct
 		barrier();
 
 		/* Ring doorbell (2 WQEs) */
-		nes_write32(nesdev->regs + NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id);
+		nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+				    0x02800000 | nesdev->cqp.qp_id);
 
 		spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 		nes_debug(NES_DBG_INIT, "Waiting for create MGT QP%u to complete.\n",
@@ -1050,7 +1053,8 @@ int nes_init_mgt_qp(struct nes_device *nesdev, struct net_device *netdev, struct
 		do {
 			counter = min(wqe_count, ((u32)255));
 			wqe_count -= counter;
-			nes_write32(nesdev->regs + NES_WQE_ALLOC, (counter << 24) | mgtvnic->mgt.qp_id);
+			nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+					(counter << 24) | mgtvnic->mgt.qp_id);
 		} while (wqe_count);
 
 		nes_write32(nesdev->regs + NES_CQE_ALLOC, NES_CQE_ALLOC_NOTIFY_NEXT |
@@ -1124,7 +1128,8 @@ void nes_destroy_mgt(struct nes_vnic *nesvnic)
 		barrier();
 
 		/* Ring doorbell (2 WQEs) */
-		nes_write32(nesdev->regs + NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id);
+		nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
+				    0x02800000 | nesdev->cqp.qp_id);
 
 		spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
 		nes_debug(NES_DBG_SHUTDOWN, "Waiting for CQP, cqp_head=%u, cqp.sq_head=%u,"
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index 0a75164..0653da0 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -683,7 +683,7 @@ static int nes_netdev_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 	barrier();
 
 	if (wqe_count)
-		nes_write32(nesdev->regs+NES_WQE_ALLOC,
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
 				(wqe_count << 24) | (1 << 23) | nesvnic->nic.qp_id);
 
 	netif_trans_update(netdev);
diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c
index 21b4a83..79a3d98 100644
--- a/drivers/infiniband/hw/nes/nes_utils.c
+++ b/drivers/infiniband/hw/nes/nes_utils.c
@@ -661,7 +661,8 @@ void nes_post_cqp_request(struct nes_device *nesdev,
 		barrier();
 
 		/* Ring doorbell (1 WQEs) */
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    0x01800000 | nesdev->cqp.qp_id);
 
 		barrier();
 	} else {
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 162475a..87f8635 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -3310,7 +3310,7 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr,
 	while (wqe_count) {
 		counter = min(wqe_count, ((u32)255));
 		wqe_count -= counter;
-		nes_write32(nesdev->regs + NES_WQE_ALLOC,
+		nes_write32_relaxed(nesdev->regs + NES_WQE_ALLOC,
 				(counter << 24) | 0x00800000 | nesqp->hwqp.qp_id);
 	}
 
@@ -3404,7 +3404,8 @@ static int nes_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *ib_wr,
 	while (wqe_count) {
 		counter = min(wqe_count, ((u32)255));
 		wqe_count -= counter;
-		nes_write32(nesdev->regs+NES_WQE_ALLOC, (counter<<24) | nesqp->hwqp.qp_id);
+		nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
+				    (counter<<24) | nesqp->hwqp.qp_id);
 	}
 
 	spin_unlock_irqrestore(&nesqp->lock, flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 6/6] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2
  2018-03-20  2:47 ` Sinan Kaya
@ 2018-03-20  2:47   ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-rdma, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Michal Kalderon,
	Ariel Elior, Doug Ledford, Jason Gunthorpe, linux-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/qedr/verbs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index ccd55f4..db60360 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -860,7 +860,7 @@ static void doorbell_cq(struct qedr_cq *cq, u32 cons, u8 flags)
 	wmb();
 	cq->db.data.agg_flags = flags;
 	cq->db.data.value = cpu_to_le32(cons);
-	writeq(cq->db.raw, cq->db_addr);
+	writeq_relaxed(cq->db.raw, cq->db_addr);
 
 	/* Make sure write would stick */
 	mmiowb();
@@ -3338,7 +3338,7 @@ int qedr_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 
 		qp->rq.db_data.data.value++;
 
-		writel(qp->rq.db_data.raw, qp->rq.db);
+		writel_relaxed(qp->rq.db_data.raw, qp->rq.db);
 
 		/* Make sure write sticks */
 		mmiowb();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [PATCH v4 6/6] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2
@ 2018-03-20  2:47   ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/qedr/verbs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index ccd55f4..db60360 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -860,7 +860,7 @@ static void doorbell_cq(struct qedr_cq *cq, u32 cons, u8 flags)
 	wmb();
 	cq->db.data.agg_flags = flags;
 	cq->db.data.value = cpu_to_le32(cons);
-	writeq(cq->db.raw, cq->db_addr);
+	writeq_relaxed(cq->db.raw, cq->db_addr);
 
 	/* Make sure write would stick */
 	mmiowb();
@@ -3338,7 +3338,7 @@ int qedr_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 
 		qp->rq.db_data.data.value++;
 
-		writel(qp->rq.db_data.raw, qp->rq.db);
+		writel_relaxed(qp->rq.db_data.raw, qp->rq.db);
 
 		/* Make sure write sticks */
 		mmiowb();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* RE: [PATCH v4 6/6] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2
  2018-03-20  2:47   ` Sinan Kaya
@ 2018-03-20  7:38     ` Kalderon, Michal
  -1 siblings, 0 replies; 100+ messages in thread
From: Kalderon, Michal @ 2018-03-20  7:38 UTC (permalink / raw)
  To: Sinan Kaya, linux-rdma, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Elior, Ariel, Doug Ledford,
	Jason Gunthorpe, linux-kernel

> From: Sinan Kaya [mailto:okaya@codeaurora.org]
> Sent: Tuesday, March 20, 2018 4:48 AM
> 
> Code includes wmb() followed by writel() in multiple places. writel() already
> has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c
> b/drivers/infiniband/hw/qedr/verbs.c
> index ccd55f4..db60360 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -860,7 +860,7 @@ static void doorbell_cq(struct qedr_cq *cq, u32 cons,
> u8 flags)
>  	wmb();
>  	cq->db.data.agg_flags = flags;
>  	cq->db.data.value = cpu_to_le32(cons);
> -	writeq(cq->db.raw, cq->db_addr);
> +	writeq_relaxed(cq->db.raw, cq->db_addr);
> 
>  	/* Make sure write would stick */
>  	mmiowb();
> @@ -3338,7 +3338,7 @@ int qedr_post_recv(struct ib_qp *ibqp, struct
> ib_recv_wr *wr,
> 
>  		qp->rq.db_data.data.value++;
> 
> -		writel(qp->rq.db_data.raw, qp->rq.db);
> +		writel_relaxed(qp->rq.db_data.raw, qp->rq.db);
> 
>  		/* Make sure write sticks */
>  		mmiowb();
> --
> 2.7.4
Acked-by: Michal Kalderon <Michal.Kalderon@cavium.com>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 6/6] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2
@ 2018-03-20  7:38     ` Kalderon, Michal
  0 siblings, 0 replies; 100+ messages in thread
From: Kalderon, Michal @ 2018-03-20  7:38 UTC (permalink / raw)
  To: linux-arm-kernel

> From: Sinan Kaya [mailto:okaya at codeaurora.org]
> Sent: Tuesday, March 20, 2018 4:48 AM
> 
> Code includes wmb() followed by writel() in multiple places. writel() already
> has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c
> b/drivers/infiniband/hw/qedr/verbs.c
> index ccd55f4..db60360 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -860,7 +860,7 @@ static void doorbell_cq(struct qedr_cq *cq, u32 cons,
> u8 flags)
>  	wmb();
>  	cq->db.data.agg_flags = flags;
>  	cq->db.data.value = cpu_to_le32(cons);
> -	writeq(cq->db.raw, cq->db_addr);
> +	writeq_relaxed(cq->db.raw, cq->db_addr);
> 
>  	/* Make sure write would stick */
>  	mmiowb();
> @@ -3338,7 +3338,7 @@ int qedr_post_recv(struct ib_qp *ibqp, struct
> ib_recv_wr *wr,
> 
>  		qp->rq.db_data.data.value++;
> 
> -		writel(qp->rq.db_data.raw, qp->rq.db);
> +		writel_relaxed(qp->rq.db_data.raw, qp->rq.db);
> 
>  		/* Make sure write sticks */
>  		mmiowb();
> --
> 2.7.4
Acked-by: Michal Kalderon <Michal.Kalderon@cavium.com>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47   ` Sinan Kaya
@ 2018-03-20 14:48     ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:48 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Selvin Xavier, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna, Doug Ledford, linux-kernel

On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> index 8329ec6..4a6b981 100644
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
>  
>  	/* ring CMDQ DB */
>  	wmb();
> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> -	       rcfw->cmdq_bar_reg_prod_off);
> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> -	       rcfw->cmdq_bar_reg_trig_off);
> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> +		       rcfw->cmdq_bar_reg_prod_off);
> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> +		       rcfw->cmdq_bar_reg_trig_off);

Woah, this may not be safe..

The definition of writel_relaxed() is that it is fully unordered, so
the above two writes may change order now. Broadcom guys would have to
ack if that it is OK or not for their hardware.

In general this is not an OK approach for a mechanical
conversion.. Only the first writel can be convereted.

You need to check all your patches to make sure there are no
subsequent writel's in the places touched.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 14:48     ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> index 8329ec6..4a6b981 100644
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
>  
>  	/* ring CMDQ DB */
>  	wmb();
> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> -	       rcfw->cmdq_bar_reg_prod_off);
> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> -	       rcfw->cmdq_bar_reg_trig_off);
> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> +		       rcfw->cmdq_bar_reg_prod_off);
> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> +		       rcfw->cmdq_bar_reg_trig_off);

Woah, this may not be safe..

The definition of writel_relaxed() is that it is fully unordered, so
the above two writes may change order now. Broadcom guys would have to
ack if that it is OK or not for their hardware.

In general this is not an OK approach for a mechanical
conversion.. Only the first writel can be convereted.

You need to check all your patches to make sure there are no
subsequent writel's in the places touched.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 2/6] IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47   ` Sinan Kaya
@ 2018-03-20 14:48     ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:48 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Yishai Hadas, Doug Ledford, linux-kernel

On Mon, Mar 19, 2018 at 10:47:44PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/infiniband/hw/mlx4/qp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index f045491..74b27b0 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -3880,8 +3880,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  		 */
>  		wmb();
>  
> -		writel(qp->doorbell_qpn,
> -		       to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
> +		writel_relaxed(qp->doorbell_qpn,
> +			to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
>  
>  		/*
>  		 * Make sure doorbells don't leak out of SQ spinlock
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 2/6] IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 14:48     ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 19, 2018 at 10:47:44PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/infiniband/hw/mlx4/qp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index f045491..74b27b0 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -3880,8 +3880,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  		 */
>  		wmb();
>  
> -		writel(qp->doorbell_qpn,
> -		       to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
> +		writel_relaxed(qp->doorbell_qpn,
> +			to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
>  
>  		/*
>  		 * Make sure doorbells don't leak out of SQ spinlock
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47   ` Sinan Kaya
@ 2018-03-20 14:51     ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:51 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Steve Wise, Doug Ledford, linux-kernel

On Mon, Mar 19, 2018 at 10:47:46PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
> index 8369c7c..6e5658a 100644
> +++ b/drivers/infiniband/hw/cxgb4/t4.h
> @@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst, u64 *src)
>  	int count = 8;
>  
>  	while (count) {
> -		writeq(*src, dst);
> +		writeq_relaxed(*src, dst);
>  		src++;
>  		dst++;
>  		count--;

This is another case where writes can be re-ordered.. IIRC dst is WC
BAR memory, so the NIC should tolerate re-ordering, but Steve will
have to ack this.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 14:51     ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 19, 2018 at 10:47:46PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
> index 8369c7c..6e5658a 100644
> +++ b/drivers/infiniband/hw/cxgb4/t4.h
> @@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst, u64 *src)
>  	int count = 8;
>  
>  	while (count) {
> -		writeq(*src, dst);
> +		writeq_relaxed(*src, dst);
>  		src++;
>  		dst++;
>  		count--;

This is another case where writes can be re-ordered.. IIRC dst is WC
BAR memory, so the NIC should tolerate re-ordering, but Steve will
have to ack this.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47   ` Sinan Kaya
@ 2018-03-20 14:54     ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:54 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Faisal Latif, Doug Ledford, linux-kernel

On Mon, Mar 19, 2018 at 10:47:47PM -0400, Sinan Kaya wrote:
> Code includes barrier() followed by writel(). writel() already has a
> barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a barrier().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/nes/nes.h       |  5 +++++
>  drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
>  drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
>  drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
>  drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
>  drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
>  6 files changed, 35 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
> index 00c27291..85e007d 100644
> +++ b/drivers/infiniband/hw/nes/nes.h
> @@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
>  	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
>  }
>  
> +static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
> +{
> +	writel_relaxed(val, addr);
> +}

This wrapper is pointless, let us not add more..

>  static inline void nes_write32(void __iomem *addr, u32 val)
>  {
>  	writel(val, addr);
> diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
> index 18a7de1..568e17d 100644
> +++ b/drivers/infiniband/hw/nes/nes_hw.c
> @@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
>  
>  	barrier();
>  	/* Ring doorbell (5 WQEs) */
> -	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
> +	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
> +			    0x05800000 | nesdev->cqp.qp_id);

barrier() is not strong enough to order writel, so this doesn't seem
right?

It is probably noteven strong enough for what this driver thinks it is
doing..  This driver is essentially dead and broken, probably just
don't change it.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 14:54     ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 19, 2018 at 10:47:47PM -0400, Sinan Kaya wrote:
> Code includes barrier() followed by writel(). writel() already has a
> barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a barrier().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/nes/nes.h       |  5 +++++
>  drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
>  drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
>  drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
>  drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
>  drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
>  6 files changed, 35 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
> index 00c27291..85e007d 100644
> +++ b/drivers/infiniband/hw/nes/nes.h
> @@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
>  	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
>  }
>  
> +static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
> +{
> +	writel_relaxed(val, addr);
> +}

This wrapper is pointless, let us not add more..

>  static inline void nes_write32(void __iomem *addr, u32 val)
>  {
>  	writel(val, addr);
> diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
> index 18a7de1..568e17d 100644
> +++ b/drivers/infiniband/hw/nes/nes_hw.c
> @@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
>  
>  	barrier();
>  	/* Ring doorbell (5 WQEs) */
> -	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
> +	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
> +			    0x05800000 | nesdev->cqp.qp_id);

barrier() is not strong enough to order writel, so this doesn't seem
right?

It is probably noteven strong enough for what this driver thinks it is
doing..  This driver is essentially dead and broken, probably just
don't change it.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 6/6] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2
  2018-03-20  2:47   ` Sinan Kaya
@ 2018-03-20 14:55     ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:55 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Michal Kalderon, Ariel Elior, Doug Ledford, linux-kernel

On Mon, Mar 19, 2018 at 10:47:48PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
> index ccd55f4..db60360 100644
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -860,7 +860,7 @@ static void doorbell_cq(struct qedr_cq *cq, u32 cons, u8 flags)
>  	wmb();
>  	cq->db.data.agg_flags = flags;
>  	cq->db.data.value = cpu_to_le32(cons);
> -	writeq(cq->db.raw, cq->db_addr);
> +	writeq_relaxed(cq->db.raw, cq->db_addr);
>  
>  	/* Make sure write would stick */
>  	mmiowb();
> @@ -3338,7 +3338,7 @@ int qedr_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
>  
>  		qp->rq.db_data.data.value++;
>  
> -		writel(qp->rq.db_data.raw, qp->rq.db);
> +		writel_relaxed(qp->rq.db_data.raw, qp->rq.db);
>  
>  		/* Make sure write sticks */
>  		mmiowb();

Looks fine, but the next lines should be relaxed too:


                /* Make sure write sticks */
                mmiowb();

                if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
                        writel(qp->rq.iwarp_db2_data.raw, qp->rq.iwarp_db2);
                        mmiowb();       /* for second doorbell */
                }

mmiowb() is strong enough to order writel, IIRC.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 6/6] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2
@ 2018-03-20 14:55     ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 19, 2018 at 10:47:48PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/qedr/verbs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
> index ccd55f4..db60360 100644
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -860,7 +860,7 @@ static void doorbell_cq(struct qedr_cq *cq, u32 cons, u8 flags)
>  	wmb();
>  	cq->db.data.agg_flags = flags;
>  	cq->db.data.value = cpu_to_le32(cons);
> -	writeq(cq->db.raw, cq->db_addr);
> +	writeq_relaxed(cq->db.raw, cq->db_addr);
>  
>  	/* Make sure write would stick */
>  	mmiowb();
> @@ -3338,7 +3338,7 @@ int qedr_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
>  
>  		qp->rq.db_data.data.value++;
>  
> -		writel(qp->rq.db_data.raw, qp->rq.db);
> +		writel_relaxed(qp->rq.db_data.raw, qp->rq.db);
>  
>  		/* Make sure write sticks */
>  		mmiowb();

Looks fine, but the next lines should be relaxed too:


                /* Make sure write sticks */
                mmiowb();

                if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
                        writel(qp->rq.iwarp_db2_data.raw, qp->rq.iwarp_db2);
                        mmiowb();       /* for second doorbell */
                }

mmiowb() is strong enough to order writel, IIRC.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47   ` Sinan Kaya
@ 2018-03-20 14:56     ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:56 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Faisal Latif, Shiraz Saleem, Doug Ledford, linux-kernel

On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/i40iw/i40iw_ctrl.c  |  6 ++++--
>  drivers/infiniband/hw/i40iw/i40iw_osdep.h |  1 +
>  drivers/infiniband/hw/i40iw/i40iw_uk.c    |  2 +-
>  drivers/infiniband/hw/i40iw/i40iw_utils.c | 11 +++++++++++
>  4 files changed, 17 insertions(+), 3 deletions(-)

The one looks fine

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 14:56     ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 14:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/i40iw/i40iw_ctrl.c  |  6 ++++--
>  drivers/infiniband/hw/i40iw/i40iw_osdep.h |  1 +
>  drivers/infiniband/hw/i40iw/i40iw_uk.c    |  2 +-
>  drivers/infiniband/hw/i40iw/i40iw_utils.c | 11 +++++++++++
>  4 files changed, 17 insertions(+), 3 deletions(-)

The one looks fine

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 14:48     ` Jason Gunthorpe
@ 2018-03-20 15:00       ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 15:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Selvin Xavier, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna, Doug Ledford, linux-kernel

On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
> On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
>> Code includes wmb() followed by writel(). writel() already has a barrier on
>> some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>>
>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>> index 8329ec6..4a6b981 100644
>> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
>>  
>>  	/* ring CMDQ DB */
>>  	wmb();
>> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>> -	       rcfw->cmdq_bar_reg_prod_off);
>> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>> -	       rcfw->cmdq_bar_reg_trig_off);
>> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>> +		       rcfw->cmdq_bar_reg_prod_off);
>> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>> +		       rcfw->cmdq_bar_reg_trig_off);
> 
> Woah, this may not be safe..
> 
> The definition of writel_relaxed() is that it is fully unordered, so
> the above two writes may change order now. Broadcom guys would have to
> ack if that it is OK or not for their hardware.
> 
> In general this is not an OK approach for a mechanical
> conversion.. Only the first writel can be convereted.
> 
> You need to check all your patches to make sure there are no
> subsequent writel's in the places touched.

I paid special attention to this one and went to check the barriers
document. According to the document, writes (whether it is relaxed or not)
are always observed by the HW inorder with respect to each other.
It just doesn't guarantee anything above writel_relaxed() to be observed.
Since we already have a wmb(), this is taken care of. 

If somebody wants things to be observed after register write, there should
have been a wmb() or mmiowb() afterwards.


> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:00       ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
> On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
>> Code includes wmb() followed by writel(). writel() already has a barrier on
>> some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>>
>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>> index 8329ec6..4a6b981 100644
>> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
>>  
>>  	/* ring CMDQ DB */
>>  	wmb();
>> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>> -	       rcfw->cmdq_bar_reg_prod_off);
>> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>> -	       rcfw->cmdq_bar_reg_trig_off);
>> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>> +		       rcfw->cmdq_bar_reg_prod_off);
>> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>> +		       rcfw->cmdq_bar_reg_trig_off);
> 
> Woah, this may not be safe..
> 
> The definition of writel_relaxed() is that it is fully unordered, so
> the above two writes may change order now. Broadcom guys would have to
> ack if that it is OK or not for their hardware.
> 
> In general this is not an OK approach for a mechanical
> conversion.. Only the first writel can be convereted.
> 
> You need to check all your patches to make sure there are no
> subsequent writel's in the places touched.

I paid special attention to this one and went to check the barriers
document. According to the document, writes (whether it is relaxed or not)
are always observed by the HW inorder with respect to each other.
It just doesn't guarantee anything above writel_relaxed() to be observed.
Since we already have a wmb(), this is taken care of. 

If somebody wants things to be observed after register write, there should
have been a wmb() or mmiowb() afterwards.


> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 15:00       ` Sinan Kaya
@ 2018-03-20 15:08         ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 15:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Selvin Xavier, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna, Doug Ledford, linux-kernel

On 3/20/2018 10:00 AM, Sinan Kaya wrote:
> On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
>> On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
>>> Code includes wmb() followed by writel(). writel() already has a barrier on
>>> some architectures like arm64.
>>>
>>> This ends up CPU observing two barriers back to back before executing the
>>> register write.
>>>
>>> Since code already has an explicit barrier call, changing writel() to
>>> writel_relaxed().
>>>
>>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>>> index 8329ec6..4a6b981 100644
>>> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>>> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
>>>  
>>>  	/* ring CMDQ DB */
>>>  	wmb();
>>> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>>> -	       rcfw->cmdq_bar_reg_prod_off);
>>> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>>> -	       rcfw->cmdq_bar_reg_trig_off);
>>> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>>> +		       rcfw->cmdq_bar_reg_prod_off);
>>> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>>> +		       rcfw->cmdq_bar_reg_trig_off);
>>
>> Woah, this may not be safe..
>>
>> The definition of writel_relaxed() is that it is fully unordered, so
>> the above two writes may change order now. Broadcom guys would have to
>> ack if that it is OK or not for their hardware.
>>
>> In general this is not an OK approach for a mechanical
>> conversion.. Only the first writel can be convereted.
>>
>> You need to check all your patches to make sure there are no
>> subsequent writel's in the places touched.
> 
> I paid special attention to this one and went to check the barriers
> document. According to the document, writes (whether it is relaxed or not)
> are always observed by the HW inorder with respect to each other.
> It just doesn't guarantee anything above writel_relaxed() to be observed.
> Since we already have a wmb(), this is taken care of. 
> 
> If somebody wants things to be observed after register write, there should
> have been a wmb() or mmiowb() afterwards.

Never mind, it will break some architectures. I'll only change the first one.

 (1) On some systems, I/O stores are not strongly ordered across all CPUs, and
     so for _all_ general drivers locks should be used and mmiowb() must be
     issued prior to unlocking the critical section.

 (2) If the accessor functions are used to refer to an I/O memory window with
     relaxed memory access properties, then _mandatory_ memory barriers are
     required to enforce ordering. 


> 
> 
>>
>> Jason
>>
> 
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:08         ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 15:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/20/2018 10:00 AM, Sinan Kaya wrote:
> On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
>> On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
>>> Code includes wmb() followed by writel(). writel() already has a barrier on
>>> some architectures like arm64.
>>>
>>> This ends up CPU observing two barriers back to back before executing the
>>> register write.
>>>
>>> Since code already has an explicit barrier call, changing writel() to
>>> writel_relaxed().
>>>
>>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>>> index 8329ec6..4a6b981 100644
>>> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>>> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
>>>  
>>>  	/* ring CMDQ DB */
>>>  	wmb();
>>> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>>> -	       rcfw->cmdq_bar_reg_prod_off);
>>> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>>> -	       rcfw->cmdq_bar_reg_trig_off);
>>> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>>> +		       rcfw->cmdq_bar_reg_prod_off);
>>> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>>> +		       rcfw->cmdq_bar_reg_trig_off);
>>
>> Woah, this may not be safe..
>>
>> The definition of writel_relaxed() is that it is fully unordered, so
>> the above two writes may change order now. Broadcom guys would have to
>> ack if that it is OK or not for their hardware.
>>
>> In general this is not an OK approach for a mechanical
>> conversion.. Only the first writel can be convereted.
>>
>> You need to check all your patches to make sure there are no
>> subsequent writel's in the places touched.
> 
> I paid special attention to this one and went to check the barriers
> document. According to the document, writes (whether it is relaxed or not)
> are always observed by the HW inorder with respect to each other.
> It just doesn't guarantee anything above writel_relaxed() to be observed.
> Since we already have a wmb(), this is taken care of. 
> 
> If somebody wants things to be observed after register write, there should
> have been a wmb() or mmiowb() afterwards.

Never mind, it will break some architectures. I'll only change the first one.

 (1) On some systems, I/O stores are not strongly ordered across all CPUs, and
     so for _all_ general drivers locks should be used and mmiowb() must be
     issued prior to unlocking the critical section.

 (2) If the accessor functions are used to refer to an I/O memory window with
     relaxed memory access properties, then _mandatory_ memory barriers are
     required to enforce ordering. 


> 
> 
>>
>> Jason
>>
> 
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 14:51     ` Jason Gunthorpe
  (?)
@ 2018-03-20 15:10       ` Steve Wise
  -1 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-20 15:10 UTC (permalink / raw)
  To: 'Jason Gunthorpe', 'Sinan Kaya'
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	'Steve Wise', 'Doug Ledford',
	linux-kernel

> On Mon, Mar 19, 2018 at 10:47:46PM -0400, Sinan Kaya wrote:
> > Code includes wmb() followed by writel(). writel() already has a barrier
on
> > some architectures like arm64.
> >
> > This ends up CPU observing two barriers back to back before executing
> the
> > register write.
> >
> > Since code already has an explicit barrier call, changing writel() to
> > writel_relaxed().
> >
> > Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >  drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/cxgb4/t4.h
> b/drivers/infiniband/hw/cxgb4/t4.h
> > index 8369c7c..6e5658a 100644
> > +++ b/drivers/infiniband/hw/cxgb4/t4.h
> > @@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst, u64
> *src)
> >  	int count = 8;
> >
> >  	while (count) {
> > -		writeq(*src, dst);
> > +		writeq_relaxed(*src, dst);
> >  		src++;
> >  		dst++;
> >  		count--;
> 
> This is another case where writes can be re-ordered.. IIRC dst is WC
> BAR memory, so the NIC should tolerate re-ordering, but Steve will
> have to ack this.
> 

Yes, this is WC BAR memory.  The goal is that pio_copy() will enable
write-combining this into a single 64B pci-e transaction.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:10       ` Steve Wise
  0 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-20 15:10 UTC (permalink / raw)
  To: 'Jason Gunthorpe', 'Sinan Kaya'
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	'Steve Wise', 'Doug Ledford',
	linux-kernel

> On Mon, Mar 19, 2018 at 10:47:46PM -0400, Sinan Kaya wrote:
> > Code includes wmb() followed by writel(). writel() already has a barrier
on
> > some architectures like arm64.
> >
> > This ends up CPU observing two barriers back to back before executing
> the
> > register write.
> >
> > Since code already has an explicit barrier call, changing writel() to
> > writel_relaxed().
> >
> > Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >  drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/cxgb4/t4.h
> b/drivers/infiniband/hw/cxgb4/t4.h
> > index 8369c7c..6e5658a 100644
> > +++ b/drivers/infiniband/hw/cxgb4/t4.h
> > @@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst, u64
> *src)
> >  	int count = 8;
> >
> >  	while (count) {
> > -		writeq(*src, dst);
> > +		writeq_relaxed(*src, dst);
> >  		src++;
> >  		dst++;
> >  		count--;
> 
> This is another case where writes can be re-ordered.. IIRC dst is WC
> BAR memory, so the NIC should tolerate re-ordering, but Steve will
> have to ack this.
> 

Yes, this is WC BAR memory.  The goal is that pio_copy() will enable
write-combining this into a single 64B pci-e transaction.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:10       ` Steve Wise
  0 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-20 15:10 UTC (permalink / raw)
  To: linux-arm-kernel

> On Mon, Mar 19, 2018 at 10:47:46PM -0400, Sinan Kaya wrote:
> > Code includes wmb() followed by writel(). writel() already has a barrier
on
> > some architectures like arm64.
> >
> > This ends up CPU observing two barriers back to back before executing
> the
> > register write.
> >
> > Since code already has an explicit barrier call, changing writel() to
> > writel_relaxed().
> >
> > Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >  drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/cxgb4/t4.h
> b/drivers/infiniband/hw/cxgb4/t4.h
> > index 8369c7c..6e5658a 100644
> > +++ b/drivers/infiniband/hw/cxgb4/t4.h
> > @@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst, u64
> *src)
> >  	int count = 8;
> >
> >  	while (count) {
> > -		writeq(*src, dst);
> > +		writeq_relaxed(*src, dst);
> >  		src++;
> >  		dst++;
> >  		count--;
> 
> This is another case where writes can be re-ordered.. IIRC dst is WC
> BAR memory, so the NIC should tolerate re-ordering, but Steve will
> have to ack this.
> 

Yes, this is WC BAR memory.  The goal is that pio_copy() will enable
write-combining this into a single 64B pci-e transaction.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 15:00       ` Sinan Kaya
@ 2018-03-20 15:20         ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 15:20 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Selvin Xavier, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna, Doug Ledford, linux-kernel

On Tue, Mar 20, 2018 at 10:00:49AM -0500, Sinan Kaya wrote:
> On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
> > On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
> >> Code includes wmb() followed by writel(). writel() already has a barrier on
> >> some architectures like arm64.
> >>
> >> This ends up CPU observing two barriers back to back before executing the
> >> register write.
> >>
> >> Since code already has an explicit barrier call, changing writel() to
> >> writel_relaxed().
> >>
> >> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
> >>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> >> index 8329ec6..4a6b981 100644
> >> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> >> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
> >>  
> >>  	/* ring CMDQ DB */
> >>  	wmb();
> >> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >> -	       rcfw->cmdq_bar_reg_prod_off);
> >> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >> -	       rcfw->cmdq_bar_reg_trig_off);
> >> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >> +		       rcfw->cmdq_bar_reg_prod_off);
> >> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >> +		       rcfw->cmdq_bar_reg_trig_off);
> > 
> > Woah, this may not be safe..
> > 
> > The definition of writel_relaxed() is that it is fully unordered, so
> > the above two writes may change order now. Broadcom guys would have to
> > ack if that it is OK or not for their hardware.
> > 
> > In general this is not an OK approach for a mechanical
> > conversion.. Only the first writel can be convereted.
> > 
> > You need to check all your patches to make sure there are no
> > subsequent writel's in the places touched.
> 
> I paid special attention to this one and went to check the barriers
> document. According to the document, writes (whether it is relaxed or not)
> are always observed by the HW inorder with respect to each other.

Oh interesting, that document got revised to make writel_relaxed less
relaxed a few years ago, didn't know that. Thanks.

However, this is still not OK, the full code is:

        /* ring CMDQ DB */
        wmb();
        writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
               rcfw->cmdq_bar_reg_prod_off);
        writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
               rcfw->cmdq_bar_reg_trig_off);
done:
        spin_unlock_irqrestore(&cmdq->lock, flags);


And the definition of _relaxed allows the writes to order outside the
spinlock region, which is very likely to be wrong in this driver.

I'm not sure adding a mmiowb() just to use a writel_relaxed is any
sort of win though?

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:20         ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 20, 2018 at 10:00:49AM -0500, Sinan Kaya wrote:
> On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
> > On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
> >> Code includes wmb() followed by writel(). writel() already has a barrier on
> >> some architectures like arm64.
> >>
> >> This ends up CPU observing two barriers back to back before executing the
> >> register write.
> >>
> >> Since code already has an explicit barrier call, changing writel() to
> >> writel_relaxed().
> >>
> >> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
> >>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> >> index 8329ec6..4a6b981 100644
> >> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> >> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
> >>  
> >>  	/* ring CMDQ DB */
> >>  	wmb();
> >> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >> -	       rcfw->cmdq_bar_reg_prod_off);
> >> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >> -	       rcfw->cmdq_bar_reg_trig_off);
> >> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >> +		       rcfw->cmdq_bar_reg_prod_off);
> >> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >> +		       rcfw->cmdq_bar_reg_trig_off);
> > 
> > Woah, this may not be safe..
> > 
> > The definition of writel_relaxed() is that it is fully unordered, so
> > the above two writes may change order now. Broadcom guys would have to
> > ack if that it is OK or not for their hardware.
> > 
> > In general this is not an OK approach for a mechanical
> > conversion.. Only the first writel can be convereted.
> > 
> > You need to check all your patches to make sure there are no
> > subsequent writel's in the places touched.
> 
> I paid special attention to this one and went to check the barriers
> document. According to the document, writes (whether it is relaxed or not)
> are always observed by the HW inorder with respect to each other.

Oh interesting, that document got revised to make writel_relaxed less
relaxed a few years ago, didn't know that. Thanks.

However, this is still not OK, the full code is:

        /* ring CMDQ DB */
        wmb();
        writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
               rcfw->cmdq_bar_reg_prod_off);
        writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
               rcfw->cmdq_bar_reg_trig_off);
done:
        spin_unlock_irqrestore(&cmdq->lock, flags);


And the definition of _relaxed allows the writes to order outside the
spinlock region, which is very likely to be wrong in this driver.

I'm not sure adding a mmiowb() just to use a writel_relaxed is any
sort of win though?

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 14:54     ` Jason Gunthorpe
@ 2018-03-20 15:23       ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 15:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Faisal Latif, Doug Ledford, linux-kernel

On 3/20/2018 9:54 AM, Jason Gunthorpe wrote:
> On Mon, Mar 19, 2018 at 10:47:47PM -0400, Sinan Kaya wrote:
>> Code includes barrier() followed by writel(). writel() already has a
>> barrier on some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Create a new wrapper function with relaxed write operator. Use the new
>> wrapper when a write is following a barrier().
>>
>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>  drivers/infiniband/hw/nes/nes.h       |  5 +++++
>>  drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
>>  drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
>>  drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
>>  drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
>>  drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
>>  6 files changed, 35 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
>> index 00c27291..85e007d 100644
>> +++ b/drivers/infiniband/hw/nes/nes.h
>> @@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
>>  	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
>>  }
>>  
>> +static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
>> +{
>> +	writel_relaxed(val, addr);
>> +}
> 
> This wrapper is pointless, let us not add more..
> 
>>  static inline void nes_write32(void __iomem *addr, u32 val)
>>  {
>>  	writel(val, addr);
>> diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
>> index 18a7de1..568e17d 100644
>> +++ b/drivers/infiniband/hw/nes/nes_hw.c
>> @@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
>>  
>>  	barrier();
>>  	/* Ring doorbell (5 WQEs) */
>> -	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
>> +	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
>> +			    0x05800000 | nesdev->cqp.qp_id);
> 
> barrier() is not strong enough to order writel, so this doesn't seem
> right?
> 
> It is probably noteven strong enough for what this driver thinks it is
> doing..  This driver is essentially dead and broken, probably just
> don't change it.

Just for the sake of other changes in netdev directory and my education...

barrier() on ARM is a wmb(). It should be a compiler barrier on intel.

You are saying barrier() should have been a wmb(), right?

I have gone through similar exercise on netdev directory and changed

barrier()
writel()

to 

barrier()
writel_relaxed()

Do you see any problem with this?

If the goal is to make memory changes observable to the hardware, it should
have been, right not barrier()?

wmb()
writel_relaxed()

> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:23       ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/20/2018 9:54 AM, Jason Gunthorpe wrote:
> On Mon, Mar 19, 2018 at 10:47:47PM -0400, Sinan Kaya wrote:
>> Code includes barrier() followed by writel(). writel() already has a
>> barrier on some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Create a new wrapper function with relaxed write operator. Use the new
>> wrapper when a write is following a barrier().
>>
>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>  drivers/infiniband/hw/nes/nes.h       |  5 +++++
>>  drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
>>  drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
>>  drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
>>  drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
>>  drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
>>  6 files changed, 35 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
>> index 00c27291..85e007d 100644
>> +++ b/drivers/infiniband/hw/nes/nes.h
>> @@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
>>  	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
>>  }
>>  
>> +static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
>> +{
>> +	writel_relaxed(val, addr);
>> +}
> 
> This wrapper is pointless, let us not add more..
> 
>>  static inline void nes_write32(void __iomem *addr, u32 val)
>>  {
>>  	writel(val, addr);
>> diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
>> index 18a7de1..568e17d 100644
>> +++ b/drivers/infiniband/hw/nes/nes_hw.c
>> @@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
>>  
>>  	barrier();
>>  	/* Ring doorbell (5 WQEs) */
>> -	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
>> +	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
>> +			    0x05800000 | nesdev->cqp.qp_id);
> 
> barrier() is not strong enough to order writel, so this doesn't seem
> right?
> 
> It is probably noteven strong enough for what this driver thinks it is
> doing..  This driver is essentially dead and broken, probably just
> don't change it.

Just for the sake of other changes in netdev directory and my education...

barrier() on ARM is a wmb(). It should be a compiler barrier on intel.

You are saying barrier() should have been a wmb(), right?

I have gone through similar exercise on netdev directory and changed

barrier()
writel()

to 

barrier()
writel_relaxed()

Do you see any problem with this?

If the goal is to make memory changes observable to the hardware, it should
have been, right not barrier()?

wmb()
writel_relaxed()

> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 15:08         ` Sinan Kaya
@ 2018-03-20 15:23           ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 15:23 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Selvin Xavier, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna, Doug Ledford, linux-kernel

On Tue, Mar 20, 2018 at 10:08:16AM -0500, Sinan Kaya wrote:

> Never mind, it will break some architectures. I'll only change the first one.
> 
>  (1) On some systems, I/O stores are not strongly ordered across all CPUs, and
>      so for _all_ general drivers locks should be used and mmiowb() must be
>      issued prior to unlocking the critical section.

I think the kernel could do well to have a spin_unlock_mmiowb()
function. We have this patern quite a bit.

Arches like x86 can just make it == spin_unlock, while PPC and ARM can
add their extra barriers.

Then we can safely and efficiently use _realxed within such a
spinlock region.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:23           ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 20, 2018 at 10:08:16AM -0500, Sinan Kaya wrote:

> Never mind, it will break some architectures. I'll only change the first one.
> 
>  (1) On some systems, I/O stores are not strongly ordered across all CPUs, and
>      so for _all_ general drivers locks should be used and mmiowb() must be
>      issued prior to unlocking the critical section.

I think the kernel could do well to have a spin_unlock_mmiowb()
function. We have this patern quite a bit.

Arches like x86 can just make it == spin_unlock, while PPC and ARM can
add their extra barriers.

Then we can safely and efficiently use _realxed within such a
spinlock region.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 15:20         ` Jason Gunthorpe
@ 2018-03-20 15:30           ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 15:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Selvin Xavier, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna, Doug Ledford, linux-kernel

On 3/20/2018 10:20 AM, Jason Gunthorpe wrote:
> On Tue, Mar 20, 2018 at 10:00:49AM -0500, Sinan Kaya wrote:
>> On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
>>> On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
>>>> Code includes wmb() followed by writel(). writel() already has a barrier on
>>>> some architectures like arm64.
>>>>
>>>> This ends up CPU observing two barriers back to back before executing the
>>>> register write.
>>>>
>>>> Since code already has an explicit barrier call, changing writel() to
>>>> writel_relaxed().
>>>>
>>>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>>>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>>>> index 8329ec6..4a6b981 100644
>>>> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>>>> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
>>>>  
>>>>  	/* ring CMDQ DB */
>>>>  	wmb();
>>>> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>>>> -	       rcfw->cmdq_bar_reg_prod_off);
>>>> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>>>> -	       rcfw->cmdq_bar_reg_trig_off);
>>>> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>>>> +		       rcfw->cmdq_bar_reg_prod_off);
>>>> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>>>> +		       rcfw->cmdq_bar_reg_trig_off);
>>>
>>> Woah, this may not be safe..
>>>
>>> The definition of writel_relaxed() is that it is fully unordered, so
>>> the above two writes may change order now. Broadcom guys would have to
>>> ack if that it is OK or not for their hardware.
>>>
>>> In general this is not an OK approach for a mechanical
>>> conversion.. Only the first writel can be convereted.
>>>
>>> You need to check all your patches to make sure there are no
>>> subsequent writel's in the places touched.
>>
>> I paid special attention to this one and went to check the barriers
>> document. According to the document, writes (whether it is relaxed or not)
>> are always observed by the HW inorder with respect to each other.
> 
> Oh interesting, that document got revised to make writel_relaxed less
> relaxed a few years ago, didn't know that. Thanks.
> 
> However, this is still not OK, the full code is:
> 
>         /* ring CMDQ DB */
>         wmb();
>         writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>                rcfw->cmdq_bar_reg_prod_off);
>         writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>                rcfw->cmdq_bar_reg_trig_off);
> done:
>         spin_unlock_irqrestore(&cmdq->lock, flags);
> 
> 
> And the definition of _relaxed allows the writes to order outside the
> spinlock region, which is very likely to be wrong in this driver.
> 
> I'm not sure adding a mmiowb() just to use a writel_relaxed is any
> sort of win though?

I'd prefer this. 

mmiowb() on ARM64 is empty. mmiowb() guarantees that code also works for PPC too.

I'll switch to this instead so it works for everybody.

> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:30           ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/20/2018 10:20 AM, Jason Gunthorpe wrote:
> On Tue, Mar 20, 2018 at 10:00:49AM -0500, Sinan Kaya wrote:
>> On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
>>> On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
>>>> Code includes wmb() followed by writel(). writel() already has a barrier on
>>>> some architectures like arm64.
>>>>
>>>> This ends up CPU observing two barriers back to back before executing the
>>>> register write.
>>>>
>>>> Since code already has an explicit barrier call, changing writel() to
>>>> writel_relaxed().
>>>>
>>>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>>>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>>>> index 8329ec6..4a6b981 100644
>>>> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
>>>> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
>>>>  
>>>>  	/* ring CMDQ DB */
>>>>  	wmb();
>>>> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>>>> -	       rcfw->cmdq_bar_reg_prod_off);
>>>> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>>>> -	       rcfw->cmdq_bar_reg_trig_off);
>>>> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>>>> +		       rcfw->cmdq_bar_reg_prod_off);
>>>> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>>>> +		       rcfw->cmdq_bar_reg_trig_off);
>>>
>>> Woah, this may not be safe..
>>>
>>> The definition of writel_relaxed() is that it is fully unordered, so
>>> the above two writes may change order now. Broadcom guys would have to
>>> ack if that it is OK or not for their hardware.
>>>
>>> In general this is not an OK approach for a mechanical
>>> conversion.. Only the first writel can be convereted.
>>>
>>> You need to check all your patches to make sure there are no
>>> subsequent writel's in the places touched.
>>
>> I paid special attention to this one and went to check the barriers
>> document. According to the document, writes (whether it is relaxed or not)
>> are always observed by the HW inorder with respect to each other.
> 
> Oh interesting, that document got revised to make writel_relaxed less
> relaxed a few years ago, didn't know that. Thanks.
> 
> However, this is still not OK, the full code is:
> 
>         /* ring CMDQ DB */
>         wmb();
>         writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
>                rcfw->cmdq_bar_reg_prod_off);
>         writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
>                rcfw->cmdq_bar_reg_trig_off);
> done:
>         spin_unlock_irqrestore(&cmdq->lock, flags);
> 
> 
> And the definition of _relaxed allows the writes to order outside the
> spinlock region, which is very likely to be wrong in this driver.
> 
> I'm not sure adding a mmiowb() just to use a writel_relaxed is any
> sort of win though?

I'd prefer this. 

mmiowb() on ARM64 is empty. mmiowb() guarantees that code also works for PPC too.

I'll switch to this instead so it works for everybody.

> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 14:51     ` Jason Gunthorpe
  (?)
@ 2018-03-20 15:38       ` Steve Wise
  -1 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-20 15:38 UTC (permalink / raw)
  To: 'Jason Gunthorpe', 'Sinan Kaya'
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	'Steve Wise', 'Doug Ledford',
	linux-kernel

> > On Mon, Mar 19, 2018 at 10:47:46PM -0400, Sinan Kaya wrote:
> > > Code includes wmb() followed by writel(). writel() already has a
barrier
> on
> > > some architectures like arm64.
> > >
> > > This ends up CPU observing two barriers back to back before executing
> > the
> > > register write.
> > >
> > > Since code already has an explicit barrier call, changing writel() to
> > > writel_relaxed().
> > >
> > > Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> > >  drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/cxgb4/t4.h
> > b/drivers/infiniband/hw/cxgb4/t4.h
> > > index 8369c7c..6e5658a 100644
> > > +++ b/drivers/infiniband/hw/cxgb4/t4.h
> > > @@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst,
> u64
> > *src)
> > >  	int count = 8;
> > >
> > >  	while (count) {
> > > -		writeq(*src, dst);
> > > +		writeq_relaxed(*src, dst);
> > >  		src++;
> > >  		dst++;
> > >  		count--;
> >
> > This is another case where writes can be re-ordered.. IIRC dst is WC
> > BAR memory, so the NIC should tolerate re-ordering, but Steve will
> > have to ack this.
> >
> 
> Yes, this is WC BAR memory.  The goal is that pio_copy() will enable
write-
> combining this into a single 64B pci-e transaction.
> 


I'd like to see the PPC issue resolved...but

Acked-by: Steve Wise <swise@opengridcomputing.com>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:38       ` Steve Wise
  0 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-20 15:38 UTC (permalink / raw)
  To: 'Jason Gunthorpe', 'Sinan Kaya'
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	'Steve Wise', 'Doug Ledford',
	linux-kernel

> > On Mon, Mar 19, 2018 at 10:47:46PM -0400, Sinan Kaya wrote:
> > > Code includes wmb() followed by writel(). writel() already has a
barrier
> on
> > > some architectures like arm64.
> > >
> > > This ends up CPU observing two barriers back to back before executing
> > the
> > > register write.
> > >
> > > Since code already has an explicit barrier call, changing writel() to
> > > writel_relaxed().
> > >
> > > Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> > >  drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/cxgb4/t4.h
> > b/drivers/infiniband/hw/cxgb4/t4.h
> > > index 8369c7c..6e5658a 100644
> > > +++ b/drivers/infiniband/hw/cxgb4/t4.h
> > > @@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst,
> u64
> > *src)
> > >  	int count = 8;
> > >
> > >  	while (count) {
> > > -		writeq(*src, dst);
> > > +		writeq_relaxed(*src, dst);
> > >  		src++;
> > >  		dst++;
> > >  		count--;
> >
> > This is another case where writes can be re-ordered.. IIRC dst is WC
> > BAR memory, so the NIC should tolerate re-ordering, but Steve will
> > have to ack this.
> >
> 
> Yes, this is WC BAR memory.  The goal is that pio_copy() will enable
write-
> combining this into a single 64B pci-e transaction.
> 


I'd like to see the PPC issue resolved...but

Acked-by: Steve Wise <swise@opengridcomputing.com>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 15:38       ` Steve Wise
  0 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-20 15:38 UTC (permalink / raw)
  To: linux-arm-kernel

> > On Mon, Mar 19, 2018 at 10:47:46PM -0400, Sinan Kaya wrote:
> > > Code includes wmb() followed by writel(). writel() already has a
barrier
> on
> > > some architectures like arm64.
> > >
> > > This ends up CPU observing two barriers back to back before executing
> > the
> > > register write.
> > >
> > > Since code already has an explicit barrier call, changing writel() to
> > > writel_relaxed().
> > >
> > > Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> > >  drivers/infiniband/hw/cxgb4/t4.h | 14 +++++++-------
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/cxgb4/t4.h
> > b/drivers/infiniband/hw/cxgb4/t4.h
> > > index 8369c7c..6e5658a 100644
> > > +++ b/drivers/infiniband/hw/cxgb4/t4.h
> > > @@ -457,7 +457,7 @@ static inline void pio_copy(u64 __iomem *dst,
> u64
> > *src)
> > >  	int count = 8;
> > >
> > >  	while (count) {
> > > -		writeq(*src, dst);
> > > +		writeq_relaxed(*src, dst);
> > >  		src++;
> > >  		dst++;
> > >  		count--;
> >
> > This is another case where writes can be re-ordered.. IIRC dst is WC
> > BAR memory, so the NIC should tolerate re-ordering, but Steve will
> > have to ack this.
> >
> 
> Yes, this is WC BAR memory.  The goal is that pio_copy() will enable
write-
> combining this into a single 64B pci-e transaction.
> 


I'd like to see the PPC issue resolved...but

Acked-by: Steve Wise <swise@opengridcomputing.com>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 15:23       ` Sinan Kaya
@ 2018-03-20 16:01         ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 16:01 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Faisal Latif, Doug Ledford, linux-kernel

On Tue, Mar 20, 2018 at 10:23:16AM -0500, Sinan Kaya wrote:
> On 3/20/2018 9:54 AM, Jason Gunthorpe wrote:
> > On Mon, Mar 19, 2018 at 10:47:47PM -0400, Sinan Kaya wrote:
> >> Code includes barrier() followed by writel(). writel() already has a
> >> barrier on some architectures like arm64.
> >>
> >> This ends up CPU observing two barriers back to back before executing the
> >> register write.
> >>
> >> Create a new wrapper function with relaxed write operator. Use the new
> >> wrapper when a write is following a barrier().
> >>
> >> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >>  drivers/infiniband/hw/nes/nes.h       |  5 +++++
> >>  drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
> >>  drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
> >>  drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
> >>  drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
> >>  drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
> >>  6 files changed, 35 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
> >> index 00c27291..85e007d 100644
> >> +++ b/drivers/infiniband/hw/nes/nes.h
> >> @@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
> >>  	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
> >>  }
> >>  
> >> +static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
> >> +{
> >> +	writel_relaxed(val, addr);
> >> +}
> > 
> > This wrapper is pointless, let us not add more..
> > 
> >>  static inline void nes_write32(void __iomem *addr, u32 val)
> >>  {
> >>  	writel(val, addr);
> >> diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
> >> index 18a7de1..568e17d 100644
> >> +++ b/drivers/infiniband/hw/nes/nes_hw.c
> >> @@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
> >>  
> >>  	barrier();
> >>  	/* Ring doorbell (5 WQEs) */
> >> -	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
> >> +	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
> >> +			    0x05800000 | nesdev->cqp.qp_id);
> > 
> > barrier() is not strong enough to order writel, so this doesn't seem
> > right?
> > 
> > It is probably noteven strong enough for what this driver thinks it is
> > doing..  This driver is essentially dead and broken, probably just
> > don't change it.
> 
> Just for the sake of other changes in netdev directory and my education...
> 
> barrier() on ARM is a wmb(). It should be a compiler barrier on intel.
> 
> You are saying barrier() should have been a wmb(), right?

Yes, that is my understanding.. barrier() is supposed to be a very
weak barrier that just ensures the CPU is locally consistent with
itself. It doesn't say anything about DMA access, or SMP cases.

I don't think it is supposed to order anything related to
writel_relaxed()

> I have gone through similar exercise on netdev directory and changed
> 
> barrier()
> writel()
> 
> to 
> 
> barrier()
> writel_relaxed()
> 
> Do you see any problem with this?

Seems dangerous as a mechanical change to me, it really depends on why
the driver author put barrier() there.

In this case, I strongly suspect nes really intended to say wmb()

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 16:01         ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 16:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 20, 2018 at 10:23:16AM -0500, Sinan Kaya wrote:
> On 3/20/2018 9:54 AM, Jason Gunthorpe wrote:
> > On Mon, Mar 19, 2018 at 10:47:47PM -0400, Sinan Kaya wrote:
> >> Code includes barrier() followed by writel(). writel() already has a
> >> barrier on some architectures like arm64.
> >>
> >> This ends up CPU observing two barriers back to back before executing the
> >> register write.
> >>
> >> Create a new wrapper function with relaxed write operator. Use the new
> >> wrapper when a write is following a barrier().
> >>
> >> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >>  drivers/infiniband/hw/nes/nes.h       |  5 +++++
> >>  drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
> >>  drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
> >>  drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
> >>  drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
> >>  drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
> >>  6 files changed, 35 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
> >> index 00c27291..85e007d 100644
> >> +++ b/drivers/infiniband/hw/nes/nes.h
> >> @@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
> >>  	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
> >>  }
> >>  
> >> +static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
> >> +{
> >> +	writel_relaxed(val, addr);
> >> +}
> > 
> > This wrapper is pointless, let us not add more..
> > 
> >>  static inline void nes_write32(void __iomem *addr, u32 val)
> >>  {
> >>  	writel(val, addr);
> >> diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
> >> index 18a7de1..568e17d 100644
> >> +++ b/drivers/infiniband/hw/nes/nes_hw.c
> >> @@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
> >>  
> >>  	barrier();
> >>  	/* Ring doorbell (5 WQEs) */
> >> -	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
> >> +	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
> >> +			    0x05800000 | nesdev->cqp.qp_id);
> > 
> > barrier() is not strong enough to order writel, so this doesn't seem
> > right?
> > 
> > It is probably noteven strong enough for what this driver thinks it is
> > doing..  This driver is essentially dead and broken, probably just
> > don't change it.
> 
> Just for the sake of other changes in netdev directory and my education...
> 
> barrier() on ARM is a wmb(). It should be a compiler barrier on intel.
> 
> You are saying barrier() should have been a wmb(), right?

Yes, that is my understanding.. barrier() is supposed to be a very
weak barrier that just ensures the CPU is locally consistent with
itself. It doesn't say anything about DMA access, or SMP cases.

I don't think it is supposed to order anything related to
writel_relaxed()

> I have gone through similar exercise on netdev directory and changed
> 
> barrier()
> writel()
> 
> to 
> 
> barrier()
> writel_relaxed()
> 
> Do you see any problem with this?

Seems dangerous as a mechanical change to me, it really depends on why
the driver author put barrier() there.

In this case, I strongly suspect nes really intended to say wmb()

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 15:30           ` Sinan Kaya
@ 2018-03-20 16:02             ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 16:02 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Selvin Xavier, Devesh Sharma, Somnath Kotur,
	Sriharsha Basavapatna, Doug Ledford, linux-kernel

On Tue, Mar 20, 2018 at 10:30:34AM -0500, Sinan Kaya wrote:
> On 3/20/2018 10:20 AM, Jason Gunthorpe wrote:
> > On Tue, Mar 20, 2018 at 10:00:49AM -0500, Sinan Kaya wrote:
> >> On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
> >>> On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
> >>>> Code includes wmb() followed by writel(). writel() already has a barrier on
> >>>> some architectures like arm64.
> >>>>
> >>>> This ends up CPU observing two barriers back to back before executing the
> >>>> register write.
> >>>>
> >>>> Since code already has an explicit barrier call, changing writel() to
> >>>> writel_relaxed().
> >>>>
> >>>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >>>>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
> >>>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> >>>> index 8329ec6..4a6b981 100644
> >>>> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> >>>> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
> >>>>  
> >>>>  	/* ring CMDQ DB */
> >>>>  	wmb();
> >>>> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >>>> -	       rcfw->cmdq_bar_reg_prod_off);
> >>>> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >>>> -	       rcfw->cmdq_bar_reg_trig_off);
> >>>> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >>>> +		       rcfw->cmdq_bar_reg_prod_off);
> >>>> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >>>> +		       rcfw->cmdq_bar_reg_trig_off);
> >>>
> >>> Woah, this may not be safe..
> >>>
> >>> The definition of writel_relaxed() is that it is fully unordered, so
> >>> the above two writes may change order now. Broadcom guys would have to
> >>> ack if that it is OK or not for their hardware.
> >>>
> >>> In general this is not an OK approach for a mechanical
> >>> conversion.. Only the first writel can be convereted.
> >>>
> >>> You need to check all your patches to make sure there are no
> >>> subsequent writel's in the places touched.
> >>
> >> I paid special attention to this one and went to check the barriers
> >> document. According to the document, writes (whether it is relaxed or not)
> >> are always observed by the HW inorder with respect to each other.
> > 
> > Oh interesting, that document got revised to make writel_relaxed less
> > relaxed a few years ago, didn't know that. Thanks.
> > 
> > However, this is still not OK, the full code is:
> > 
> >         /* ring CMDQ DB */
> >         wmb();
> >         writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >                rcfw->cmdq_bar_reg_prod_off);
> >         writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >                rcfw->cmdq_bar_reg_trig_off);
> > done:
> >         spin_unlock_irqrestore(&cmdq->lock, flags);
> > 
> > 
> > And the definition of _relaxed allows the writes to order outside the
> > spinlock region, which is very likely to be wrong in this driver.
> > 
> > I'm not sure adding a mmiowb() just to use a writel_relaxed is any
> > sort of win though?
> 
> I'd prefer this. 
> 
> mmiowb() on ARM64 is empty. mmiowb() guarantees that code also works for PPC too.
> 
> I'll switch to this instead so it works for everybody.

It looks like a compiler barrier on x86 so that seems fine too.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 1/6] RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 16:02             ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 16:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 20, 2018 at 10:30:34AM -0500, Sinan Kaya wrote:
> On 3/20/2018 10:20 AM, Jason Gunthorpe wrote:
> > On Tue, Mar 20, 2018 at 10:00:49AM -0500, Sinan Kaya wrote:
> >> On 3/20/2018 9:48 AM, Jason Gunthorpe wrote:
> >>> On Mon, Mar 19, 2018 at 10:47:43PM -0400, Sinan Kaya wrote:
> >>>> Code includes wmb() followed by writel(). writel() already has a barrier on
> >>>> some architectures like arm64.
> >>>>
> >>>> This ends up CPU observing two barriers back to back before executing the
> >>>> register write.
> >>>>
> >>>> Since code already has an explicit barrier call, changing writel() to
> >>>> writel_relaxed().
> >>>>
> >>>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >>>>  drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 8 ++++----
> >>>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> >>>> index 8329ec6..4a6b981 100644
> >>>> +++ b/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
> >>>> @@ -181,10 +181,10 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
> >>>>  
> >>>>  	/* ring CMDQ DB */
> >>>>  	wmb();
> >>>> -	writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >>>> -	       rcfw->cmdq_bar_reg_prod_off);
> >>>> -	writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >>>> -	       rcfw->cmdq_bar_reg_trig_off);
> >>>> +	writel_relaxed(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >>>> +		       rcfw->cmdq_bar_reg_prod_off);
> >>>> +	writel_relaxed(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >>>> +		       rcfw->cmdq_bar_reg_trig_off);
> >>>
> >>> Woah, this may not be safe..
> >>>
> >>> The definition of writel_relaxed() is that it is fully unordered, so
> >>> the above two writes may change order now. Broadcom guys would have to
> >>> ack if that it is OK or not for their hardware.
> >>>
> >>> In general this is not an OK approach for a mechanical
> >>> conversion.. Only the first writel can be convereted.
> >>>
> >>> You need to check all your patches to make sure there are no
> >>> subsequent writel's in the places touched.
> >>
> >> I paid special attention to this one and went to check the barriers
> >> document. According to the document, writes (whether it is relaxed or not)
> >> are always observed by the HW inorder with respect to each other.
> > 
> > Oh interesting, that document got revised to make writel_relaxed less
> > relaxed a few years ago, didn't know that. Thanks.
> > 
> > However, this is still not OK, the full code is:
> > 
> >         /* ring CMDQ DB */
> >         wmb();
> >         writel(cmdq_prod, rcfw->cmdq_bar_reg_iomem +
> >                rcfw->cmdq_bar_reg_prod_off);
> >         writel(RCFW_CMDQ_TRIG_VAL, rcfw->cmdq_bar_reg_iomem +
> >                rcfw->cmdq_bar_reg_trig_off);
> > done:
> >         spin_unlock_irqrestore(&cmdq->lock, flags);
> > 
> > 
> > And the definition of _relaxed allows the writes to order outside the
> > spinlock region, which is very likely to be wrong in this driver.
> > 
> > I'm not sure adding a mmiowb() just to use a writel_relaxed is any
> > sort of win though?
> 
> I'd prefer this. 
> 
> mmiowb() on ARM64 is empty. mmiowb() guarantees that code also works for PPC too.
> 
> I'll switch to this instead so it works for everybody.

It looks like a compiler barrier on x86 so that seems fine too.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 16:01         ` Jason Gunthorpe
@ 2018-03-20 16:08           ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 16:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Faisal Latif, Doug Ledford, linux-kernel

On 3/20/2018 11:01 AM, Jason Gunthorpe wrote:
> On Tue, Mar 20, 2018 at 10:23:16AM -0500, Sinan Kaya wrote:
>> On 3/20/2018 9:54 AM, Jason Gunthorpe wrote:
>>> On Mon, Mar 19, 2018 at 10:47:47PM -0400, Sinan Kaya wrote:
>>>> Code includes barrier() followed by writel(). writel() already has a
>>>> barrier on some architectures like arm64.
>>>>
>>>> This ends up CPU observing two barriers back to back before executing the
>>>> register write.
>>>>
>>>> Create a new wrapper function with relaxed write operator. Use the new
>>>> wrapper when a write is following a barrier().
>>>>
>>>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>>>  drivers/infiniband/hw/nes/nes.h       |  5 +++++
>>>>  drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
>>>>  drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
>>>>  drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
>>>>  drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
>>>>  drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
>>>>  6 files changed, 35 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
>>>> index 00c27291..85e007d 100644
>>>> +++ b/drivers/infiniband/hw/nes/nes.h
>>>> @@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
>>>>  	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
>>>>  }
>>>>  
>>>> +static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
>>>> +{
>>>> +	writel_relaxed(val, addr);
>>>> +}
>>>
>>> This wrapper is pointless, let us not add more..
>>>
>>>>  static inline void nes_write32(void __iomem *addr, u32 val)
>>>>  {
>>>>  	writel(val, addr);
>>>> diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
>>>> index 18a7de1..568e17d 100644
>>>> +++ b/drivers/infiniband/hw/nes/nes_hw.c
>>>> @@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
>>>>  
>>>>  	barrier();
>>>>  	/* Ring doorbell (5 WQEs) */
>>>> -	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
>>>> +	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
>>>> +			    0x05800000 | nesdev->cqp.qp_id);
>>>
>>> barrier() is not strong enough to order writel, so this doesn't seem
>>> right?
>>>
>>> It is probably noteven strong enough for what this driver thinks it is
>>> doing..  This driver is essentially dead and broken, probably just
>>> don't change it.
>>
>> Just for the sake of other changes in netdev directory and my education...
>>
>> barrier() on ARM is a wmb(). It should be a compiler barrier on intel.
>>
>> You are saying barrier() should have been a wmb(), right?
> 
> Yes, that is my understanding.. barrier() is supposed to be a very
> weak barrier that just ensures the CPU is locally consistent with
> itself. It doesn't say anything about DMA access, or SMP cases.
> 
> I don't think it is supposed to order anything related to
> writel_relaxed()
> 
>> I have gone through similar exercise on netdev directory and changed
>>
>> barrier()
>> writel()
>>
>> to 
>>
>> barrier()
>> writel_relaxed()
>>
>> Do you see any problem with this?
> 
> Seems dangerous as a mechanical change to me, it really depends on why
> the driver author put barrier() there.

OK. I'll drop those changes.

> 
> In this case, I strongly suspect nes really intended to say wmb()

Should I change barrier() to wmb() or leave it alone? I have no idea if 
nes is actively being maintained or if it is EOL.

> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 16:08           ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-20 16:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/20/2018 11:01 AM, Jason Gunthorpe wrote:
> On Tue, Mar 20, 2018 at 10:23:16AM -0500, Sinan Kaya wrote:
>> On 3/20/2018 9:54 AM, Jason Gunthorpe wrote:
>>> On Mon, Mar 19, 2018 at 10:47:47PM -0400, Sinan Kaya wrote:
>>>> Code includes barrier() followed by writel(). writel() already has a
>>>> barrier on some architectures like arm64.
>>>>
>>>> This ends up CPU observing two barriers back to back before executing the
>>>> register write.
>>>>
>>>> Create a new wrapper function with relaxed write operator. Use the new
>>>> wrapper when a write is following a barrier().
>>>>
>>>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>>>>  drivers/infiniband/hw/nes/nes.h       |  5 +++++
>>>>  drivers/infiniband/hw/nes/nes_hw.c    | 21 ++++++++++++++-------
>>>>  drivers/infiniband/hw/nes/nes_mgt.c   | 15 ++++++++++-----
>>>>  drivers/infiniband/hw/nes/nes_nic.c   |  2 +-
>>>>  drivers/infiniband/hw/nes/nes_utils.c |  3 ++-
>>>>  drivers/infiniband/hw/nes/nes_verbs.c |  5 +++--
>>>>  6 files changed, 35 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
>>>> index 00c27291..85e007d 100644
>>>> +++ b/drivers/infiniband/hw/nes/nes.h
>>>> @@ -387,6 +387,11 @@ static inline void nes_write_indexed(struct nes_device *nesdev, u32 reg_index, u
>>>>  	spin_unlock_irqrestore(&nesdev->indexed_regs_lock, flags);
>>>>  }
>>>>  
>>>> +static inline void nes_write32_relaxed(void __iomem *addr, u32 val)
>>>> +{
>>>> +	writel_relaxed(val, addr);
>>>> +}
>>>
>>> This wrapper is pointless, let us not add more..
>>>
>>>>  static inline void nes_write32(void __iomem *addr, u32 val)
>>>>  {
>>>>  	writel(val, addr);
>>>> diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
>>>> index 18a7de1..568e17d 100644
>>>> +++ b/drivers/infiniband/hw/nes/nes_hw.c
>>>> @@ -1257,7 +1257,8 @@ int nes_destroy_cqp(struct nes_device *nesdev)
>>>>  
>>>>  	barrier();
>>>>  	/* Ring doorbell (5 WQEs) */
>>>> -	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 | nesdev->cqp.qp_id);
>>>> +	nes_write32_relaxed(nesdev->regs+NES_WQE_ALLOC,
>>>> +			    0x05800000 | nesdev->cqp.qp_id);
>>>
>>> barrier() is not strong enough to order writel, so this doesn't seem
>>> right?
>>>
>>> It is probably noteven strong enough for what this driver thinks it is
>>> doing..  This driver is essentially dead and broken, probably just
>>> don't change it.
>>
>> Just for the sake of other changes in netdev directory and my education...
>>
>> barrier() on ARM is a wmb(). It should be a compiler barrier on intel.
>>
>> You are saying barrier() should have been a wmb(), right?
> 
> Yes, that is my understanding.. barrier() is supposed to be a very
> weak barrier that just ensures the CPU is locally consistent with
> itself. It doesn't say anything about DMA access, or SMP cases.
> 
> I don't think it is supposed to order anything related to
> writel_relaxed()
> 
>> I have gone through similar exercise on netdev directory and changed
>>
>> barrier()
>> writel()
>>
>> to 
>>
>> barrier()
>> writel_relaxed()
>>
>> Do you see any problem with this?
> 
> Seems dangerous as a mechanical change to me, it really depends on why
> the driver author put barrier() there.

OK. I'll drop those changes.

> 
> In this case, I strongly suspect nes really intended to say wmb()

Should I change barrier() to wmb() or leave it alone? I have no idea if 
nes is actively being maintained or if it is EOL.

> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20 16:08           ` Sinan Kaya
@ 2018-03-20 16:29             ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 16:29 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Faisal Latif, Doug Ledford, linux-kernel

On Tue, Mar 20, 2018 at 11:08:59AM -0500, Sinan Kaya wrote:

> > In this case, I strongly suspect nes really intended to say wmb()
> 
> Should I change barrier() to wmb() or leave it alone? I have no idea if 
> nes is actively being maintained or if it is EOL.

It is dead, and the HW doesn't work on 64 bit machines. Just ignore it
is my advice.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 5/6] IB/nes: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-20 16:29             ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-20 16:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 20, 2018 at 11:08:59AM -0500, Sinan Kaya wrote:

> > In this case, I strongly suspect nes really intended to say wmb()
> 
> Should I change barrier() to wmb() or leave it alone? I have no idea if 
> nes is actively being maintained or if it is EOL.

It is dead, and the HW doesn't work on 64 bit machines. Just ignore it
is my advice.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47   ` Sinan Kaya
@ 2018-03-21 13:38     ` Shiraz Saleem
  -1 siblings, 0 replies; 100+ messages in thread
From: Shiraz Saleem @ 2018-03-21 13:38 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Latif, Faisal, Doug Ledford, Jason Gunthorpe, linux-kernel

On Mon, Mar 19, 2018 at 08:47:45PM -0600, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>

Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-21 13:38     ` Shiraz Saleem
  0 siblings, 0 replies; 100+ messages in thread
From: Shiraz Saleem @ 2018-03-21 13:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 19, 2018 at 08:47:45PM -0600, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>

Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47   ` Sinan Kaya
@ 2018-03-21 20:02     ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-21 20:02 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Faisal Latif, Shiraz Saleem, Doug Ledford, linux-kernel

On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
> diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
> index 8afa5a6..7f0ebed 100644
> +++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
> @@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
>  
>  	wmb(); /* make sure WQE is populated before valid bit is set */
>  
> -	writel(cq->cq_id, cq->cqe_alloc_reg);
> +	writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
>  }

Ah, this one is probably not OK, i40iw_cq_request_notification is
called here:

	spin_lock_irqsave(&iwcq->lock, flags);
	ukcq->ops.iw_cq_request_notification(ukcq, cq_notify);
	spin_unlock_irqrestore(&iwcq->lock, flags);

So this needs to add mmmiomb(); to keep the same semantics.

Generally I think you need to be very careful to ensure that any
conversion to _relaxed isn't contained by a spinlock, or the mmiomb()
is present.

Maybe even do a first series with this obviously correct pattern:

 wmb();
 writel() -> writel_relaxed()
 writel() -> writel_relaxed()
 [..]
 mmiowmb();

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-21 20:02     ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-21 20:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
> diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
> index 8afa5a6..7f0ebed 100644
> +++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
> @@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
>  
>  	wmb(); /* make sure WQE is populated before valid bit is set */
>  
> -	writel(cq->cq_id, cq->cqe_alloc_reg);
> +	writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
>  }

Ah, this one is probably not OK, i40iw_cq_request_notification is
called here:

	spin_lock_irqsave(&iwcq->lock, flags);
	ukcq->ops.iw_cq_request_notification(ukcq, cq_notify);
	spin_unlock_irqrestore(&iwcq->lock, flags);

So this needs to add mmmiomb(); to keep the same semantics.

Generally I think you need to be very careful to ensure that any
conversion to _relaxed isn't contained by a spinlock, or the mmiomb()
is present.

Maybe even do a first series with this obviously correct pattern:

 wmb();
 writel() -> writel_relaxed()
 writel() -> writel_relaxed()
 [..]
 mmiowmb();

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 0/6] ib: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47 ` Sinan Kaya
@ 2018-03-21 20:08   ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-21 20:08 UTC (permalink / raw)
  To: Sinan Kaya; +Cc: linux-rdma, linux-arm-msm, timur, sulrich, linux-arm-kernel

On Mon, Mar 19, 2018 at 10:47:42PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> I did a regex search for wmb() followed by writel() in each drivers
> directory.
> I scrubbed the ones I care about in this series.
> 
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
> 
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
> 
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
> 
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
> 
> Feel free to apply patches individually.
> 
> Changes since v3:
> - https://www.spinics.net/lists/arm-kernel/msg641851.html
> - group patches together into subsystems ib:...
> - collect reviewed and tested bys
> - scrub barrier()

> Sinan Kaya (6):
>   IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
>   infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered
>     archs

I took these two patches to for-next since they seem OK to me

>   RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
>   RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
>   IB/nes: Eliminate duplicate barriers on weakly-ordered archs
>   RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2

These will need their comments addressed

Thanks,
Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 0/6] ib: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-21 20:08   ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-21 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 19, 2018 at 10:47:42PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> I did a regex search for wmb() followed by writel() in each drivers
> directory.
> I scrubbed the ones I care about in this series.
> 
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
> 
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
> 
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
> 
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
> 
> Feel free to apply patches individually.
> 
> Changes since v3:
> - https://www.spinics.net/lists/arm-kernel/msg641851.html
> - group patches together into subsystems ib:...
> - collect reviewed and tested bys
> - scrub barrier()

> Sinan Kaya (6):
>   IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
>   infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered
>     archs

I took these two patches to for-next since they seem OK to me

>   RDMA/bnxt_re: Eliminate duplicate barriers on weakly-ordered archs
>   RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
>   IB/nes: Eliminate duplicate barriers on weakly-ordered archs
>   RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2

These will need their comments addressed

Thanks,
Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-21 20:02     ` Jason Gunthorpe
@ 2018-03-21 21:01       ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-21 21:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Faisal Latif, Shiraz Saleem, Doug Ledford, linux-kernel

On 3/21/2018 3:02 PM, Jason Gunthorpe wrote:
> On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
>> diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
>> index 8afa5a6..7f0ebed 100644
>> +++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
>> @@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
>>  
>>  	wmb(); /* make sure WQE is populated before valid bit is set */
>>  
>> -	writel(cq->cq_id, cq->cqe_alloc_reg);
>> +	writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
>>  }
> 
> Ah, this one is probably not OK, i40iw_cq_request_notification is
> called here:
> 
> 	spin_lock_irqsave(&iwcq->lock, flags);
> 	ukcq->ops.iw_cq_request_notification(ukcq, cq_notify);
> 	spin_unlock_irqrestore(&iwcq->lock, flags);
> 
> So this needs to add mmmiomb(); to keep the same semantics.
> 
> Generally I think you need to be very careful to ensure that any
> conversion to _relaxed isn't contained by a spinlock, or the mmiomb()
> is present.
> 
> Maybe even do a first series with this obviously correct pattern:
> 
>  wmb();
>  writel() -> writel_relaxed()
>  writel() -> writel_relaxed()
>  [..]
>  mmiowmb();

Good catch. I changed it as follows:

+++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
@@ -723,7 +723,8 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,

        wmb(); /* make sure WQE is populated before valid bit is set */

-       writel(cq->cq_id, cq->cqe_alloc_reg);
+       writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
+       mmiowb();
 }


> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-21 21:01       ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-21 21:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/21/2018 3:02 PM, Jason Gunthorpe wrote:
> On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
>> diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
>> index 8afa5a6..7f0ebed 100644
>> +++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
>> @@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
>>  
>>  	wmb(); /* make sure WQE is populated before valid bit is set */
>>  
>> -	writel(cq->cq_id, cq->cqe_alloc_reg);
>> +	writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
>>  }
> 
> Ah, this one is probably not OK, i40iw_cq_request_notification is
> called here:
> 
> 	spin_lock_irqsave(&iwcq->lock, flags);
> 	ukcq->ops.iw_cq_request_notification(ukcq, cq_notify);
> 	spin_unlock_irqrestore(&iwcq->lock, flags);
> 
> So this needs to add mmmiomb(); to keep the same semantics.
> 
> Generally I think you need to be very careful to ensure that any
> conversion to _relaxed isn't contained by a spinlock, or the mmiomb()
> is present.
> 
> Maybe even do a first series with this obviously correct pattern:
> 
>  wmb();
>  writel() -> writel_relaxed()
>  writel() -> writel_relaxed()
>  [..]
>  mmiowmb();

Good catch. I changed it as follows:

+++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
@@ -723,7 +723,8 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,

        wmb(); /* make sure WQE is populated before valid bit is set */

-       writel(cq->cq_id, cq->cqe_alloc_reg);
+       writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
+       mmiowb();
 }


> 
> Jason
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47   ` Sinan Kaya
  (?)
@ 2018-03-22  6:44     ` kbuild test robot
  -1 siblings, 0 replies; 100+ messages in thread
From: kbuild test robot @ 2018-03-22  6:44 UTC (permalink / raw)
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Sinan Kaya, Steve Wise, Doug Ledford,
	Jason Gunthorpe, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1895 bytes --]

Hi Sinan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6 next-20180321]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
                    from drivers/infiniband/hw/cxgb4/device.c:40:
   drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'writel_relaxed'? [-Werror=implicit-function-declaration]
      writeq_relaxed(*src, dst);
      ^~~~~~~~~~~~~~
      writel_relaxed
   cc1: some warnings being treated as errors

vim +460 drivers/infiniband/hw/cxgb4/t4.h

   450	
   451	/* This function copies 64 byte coalesced work request to memory
   452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
   453	 * from the FIFO instead of from Host.
   454	 */
   455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
   456	{
   457		int count = 8;
   458	
   459		while (count) {
 > 460			writeq_relaxed(*src, dst);
   461			src++;
   462			dst++;
   463			count--;
   464		}
   465	}
   466	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 53070 bytes --]

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22  6:44     ` kbuild test robot
  0 siblings, 0 replies; 100+ messages in thread
From: kbuild test robot @ 2018-03-22  6:44 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Sinan Kaya, Steve Wise, Doug Ledford,
	Jason Gunthorpe, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1895 bytes --]

Hi Sinan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6 next-20180321]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
                    from drivers/infiniband/hw/cxgb4/device.c:40:
   drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'writel_relaxed'? [-Werror=implicit-function-declaration]
      writeq_relaxed(*src, dst);
      ^~~~~~~~~~~~~~
      writel_relaxed
   cc1: some warnings being treated as errors

vim +460 drivers/infiniband/hw/cxgb4/t4.h

   450	
   451	/* This function copies 64 byte coalesced work request to memory
   452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
   453	 * from the FIFO instead of from Host.
   454	 */
   455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
   456	{
   457		int count = 8;
   458	
   459		while (count) {
 > 460			writeq_relaxed(*src, dst);
   461			src++;
   462			dst++;
   463			count--;
   464		}
   465	}
   466	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 53070 bytes --]

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22  6:44     ` kbuild test robot
  0 siblings, 0 replies; 100+ messages in thread
From: kbuild test robot @ 2018-03-22  6:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sinan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6 next-20180321]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
                    from drivers/infiniband/hw/cxgb4/device.c:40:
   drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'writel_relaxed'? [-Werror=implicit-function-declaration]
      writeq_relaxed(*src, dst);
      ^~~~~~~~~~~~~~
      writel_relaxed
   cc1: some warnings being treated as errors

vim +460 drivers/infiniband/hw/cxgb4/t4.h

   450	
   451	/* This function copies 64 byte coalesced work request to memory
   452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
   453	 * from the FIFO instead of from Host.
   454	 */
   455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
   456	{
   457		int count = 8;
   458	
   459		while (count) {
 > 460			writeq_relaxed(*src, dst);
   461			src++;
   462			dst++;
   463			count--;
   464		}
   465	}
   466	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 53070 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180322/6a0011b7/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22  6:44     ` kbuild test robot
@ 2018-03-22 12:24       ` okaya at codeaurora.org
  -1 siblings, 0 replies; 100+ messages in thread
From: okaya @ 2018-03-22 12:24 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Steve Wise, Doug Ledford, Jason Gunthorpe,
	linux-kernel

On 2018-03-22 02:44, kbuild test robot wrote:
> Hi Sinan,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.16-rc6 next-20180321]
> [if your patch is applied to the wrong git tree, please drop us a note
> to help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
> config: xtensa-allyesconfig (attached as .config)
> compiler: xtensa-linux-gcc (GCC) 7.2.0
> reproduce:
>         wget
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
> -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=xtensa
> 

Jason,

Can you remove the writeq change if it is too late for me to fix?

This is an infrastructural issue on xtensa arch.

Probably, it won't get fixed today.

Sinan


> All errors (new ones prefixed by >>):
> 
>    In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
>                     from drivers/infiniband/hw/cxgb4/device.c:40:
>    drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration 
>>> of function 'writeq_relaxed'; did you mean 'writel_relaxed'? 
>>> [-Werror=implicit-function-declaration]
>       writeq_relaxed(*src, dst);
>       ^~~~~~~~~~~~~~
>       writel_relaxed
>    cc1: some warnings being treated as errors
> 
> vim +460 drivers/infiniband/hw/cxgb4/t4.h
> 
>    450
>    451	/* This function copies 64 byte coalesced work request to memory
>    452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
>    453	 * from the FIFO instead of from Host.
>    454	 */
>    455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
>    456	{
>    457		int count = 8;
>    458
>    459		while (count) {
>  > 460			writeq_relaxed(*src, dst);
>    461			src++;
>    462			dst++;
>    463			count--;
>    464		}
>    465	}
>    466
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology 
> Center
> https://lists.01.org/pipermail/kbuild-all                   Intel 
> Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 12:24       ` okaya at codeaurora.org
  0 siblings, 0 replies; 100+ messages in thread
From: okaya at codeaurora.org @ 2018-03-22 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 2018-03-22 02:44, kbuild test robot wrote:
> Hi Sinan,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.16-rc6 next-20180321]
> [if your patch is applied to the wrong git tree, please drop us a note
> to help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
> config: xtensa-allyesconfig (attached as .config)
> compiler: xtensa-linux-gcc (GCC) 7.2.0
> reproduce:
>         wget
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
> -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=xtensa
> 

Jason,

Can you remove the writeq change if it is too late for me to fix?

This is an infrastructural issue on xtensa arch.

Probably, it won't get fixed today.

Sinan


> All errors (new ones prefixed by >>):
> 
>    In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
>                     from drivers/infiniband/hw/cxgb4/device.c:40:
>    drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration 
>>> of function 'writeq_relaxed'; did you mean 'writel_relaxed'? 
>>> [-Werror=implicit-function-declaration]
>       writeq_relaxed(*src, dst);
>       ^~~~~~~~~~~~~~
>       writel_relaxed
>    cc1: some warnings being treated as errors
> 
> vim +460 drivers/infiniband/hw/cxgb4/t4.h
> 
>    450
>    451	/* This function copies 64 byte coalesced work request to memory
>    452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
>    453	 * from the FIFO instead of from Host.
>    454	 */
>    455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
>    456	{
>    457		int count = 8;
>    458
>    459		while (count) {
>  > 460			writeq_relaxed(*src, dst);
>    461			src++;
>    462			dst++;
>    463			count--;
>    464		}
>    465	}
>    466
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology 
> Center
> https://lists.01.org/pipermail/kbuild-all                   Intel 
> Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 12:24       ` okaya at codeaurora.org
@ 2018-03-22 12:48         ` okaya at codeaurora.org
  -1 siblings, 0 replies; 100+ messages in thread
From: okaya @ 2018-03-22 12:48 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Steve Wise, Doug Ledford, Jason Gunthorpe,
	linux-kernel

On 2018-03-22 08:24, okaya@codeaurora.org wrote:
> On 2018-03-22 02:44, kbuild test robot wrote:
>> Hi Sinan,
>> 
>> Thank you for the patch! Yet something to improve:
>> 
>> [auto build test ERROR on linus/master]
>> [also build test ERROR on v4.16-rc6 next-20180321]
>> [if your patch is applied to the wrong git tree, please drop us a note
>> to help improve the system]
>> 
>> url:
>> https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
>> config: xtensa-allyesconfig (attached as .config)
>> compiler: xtensa-linux-gcc (GCC) 7.2.0
>> reproduce:
>>         wget
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
>> -O ~/bin/make.cross
>>         chmod +x ~/bin/make.cross
>>         # save the attached .config to linux build tree
>>         make.cross ARCH=xtensa
>> 
> 
> Jason,
> 
> Can you remove the writeq change if it is too late for me to fix?
> 
> This is an infrastructural issue on xtensa arch.
> 
> Probably, it won't get fixed today.

AFAIS, even writeq won't compile on this arch. I started questioning 
this build test.


> 
> Sinan
> 
> 
>> All errors (new ones prefixed by >>):
>> 
>>    In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
>>                     from drivers/infiniband/hw/cxgb4/device.c:40:
>>    drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>>>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration 
>>>> of function 'writeq_relaxed'; did you mean 'writel_relaxed'? 
>>>> [-Werror=implicit-function-declaration]
>>       writeq_relaxed(*src, dst);
>>       ^~~~~~~~~~~~~~
>>       writel_relaxed
>>    cc1: some warnings being treated as errors
>> 
>> vim +460 drivers/infiniband/hw/cxgb4/t4.h
>> 
>>    450
>>    451	/* This function copies 64 byte coalesced work request to 
>> memory
>>    452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
>>    453	 * from the FIFO instead of from Host.
>>    454	 */
>>    455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
>>    456	{
>>    457		int count = 8;
>>    458
>>    459		while (count) {
>>  > 460			writeq_relaxed(*src, dst);
>>    461			src++;
>>    462			dst++;
>>    463			count--;
>>    464		}
>>    465	}
>>    466
>> 
>> ---
>> 0-DAY kernel test infrastructure                Open Source Technology 
>> Center
>> https://lists.01.org/pipermail/kbuild-all                   Intel 
>> Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 12:48         ` okaya at codeaurora.org
  0 siblings, 0 replies; 100+ messages in thread
From: okaya at codeaurora.org @ 2018-03-22 12:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 2018-03-22 08:24, okaya at codeaurora.org wrote:
> On 2018-03-22 02:44, kbuild test robot wrote:
>> Hi Sinan,
>> 
>> Thank you for the patch! Yet something to improve:
>> 
>> [auto build test ERROR on linus/master]
>> [also build test ERROR on v4.16-rc6 next-20180321]
>> [if your patch is applied to the wrong git tree, please drop us a note
>> to help improve the system]
>> 
>> url:
>> https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
>> config: xtensa-allyesconfig (attached as .config)
>> compiler: xtensa-linux-gcc (GCC) 7.2.0
>> reproduce:
>>         wget
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
>> -O ~/bin/make.cross
>>         chmod +x ~/bin/make.cross
>>         # save the attached .config to linux build tree
>>         make.cross ARCH=xtensa
>> 
> 
> Jason,
> 
> Can you remove the writeq change if it is too late for me to fix?
> 
> This is an infrastructural issue on xtensa arch.
> 
> Probably, it won't get fixed today.

AFAIS, even writeq won't compile on this arch. I started questioning 
this build test.


> 
> Sinan
> 
> 
>> All errors (new ones prefixed by >>):
>> 
>>    In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
>>                     from drivers/infiniband/hw/cxgb4/device.c:40:
>>    drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>>>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration 
>>>> of function 'writeq_relaxed'; did you mean 'writel_relaxed'? 
>>>> [-Werror=implicit-function-declaration]
>>       writeq_relaxed(*src, dst);
>>       ^~~~~~~~~~~~~~
>>       writel_relaxed
>>    cc1: some warnings being treated as errors
>> 
>> vim +460 drivers/infiniband/hw/cxgb4/t4.h
>> 
>>    450
>>    451	/* This function copies 64 byte coalesced work request to 
>> memory
>>    452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
>>    453	 * from the FIFO instead of from Host.
>>    454	 */
>>    455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
>>    456	{
>>    457		int count = 8;
>>    458
>>    459		while (count) {
>>  > 460			writeq_relaxed(*src, dst);
>>    461			src++;
>>    462			dst++;
>>    463			count--;
>>    464		}
>>    465	}
>>    466
>> 
>> ---
>> 0-DAY kernel test infrastructure                Open Source Technology 
>> Center
>> https://lists.01.org/pipermail/kbuild-all                   Intel 
>> Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 12:48         ` okaya at codeaurora.org
@ 2018-03-22 14:33           ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-22 14:33 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Steve Wise, Doug Ledford, Jason Gunthorpe,
	linux-kernel

One more time hopefully without HTML this time.


On 03/22/2018 08:48 AM, okaya@codeaurora.org wrote:
>> Can you remove the writeq change if it is too late for me to fix?
>>
>> This is an infrastructural issue on xtensa arch.
>>
>> Probably, it won't get fixed today.
>
> AFAIS, even writeq won't compile on this arch. I started questioning 
> this build test.

I found out that the solution is this:

#include <linux/io-64-nonatomic-hi-lo.h>

https://patchwork.ozlabs.org/patch/511801/

I did a compile test with this change on xtensa and it passed. I'll 
repost with the added diff.

+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -46,7 +46,7 @@
  #include <linux/timer.h>
  #include <linux/io.h>
  #include <linux/workqueue.h>
-
+#include <linux/io-64-nonatomic-hi-lo.h>
  #include <asm/byteorder.h>

  #include <net/net_namespace.h>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 14:33           ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-22 14:33 UTC (permalink / raw)
  To: linux-arm-kernel

One more time hopefully without HTML this time.


On 03/22/2018 08:48 AM, okaya at codeaurora.org wrote:
>> Can you remove the writeq change if it is too late for me to fix?
>>
>> This is an infrastructural issue on xtensa arch.
>>
>> Probably, it won't get fixed today.
>
> AFAIS, even writeq won't compile on this arch. I started questioning 
> this build test.

I found out that the solution is this:

#include <linux/io-64-nonatomic-hi-lo.h>

https://patchwork.ozlabs.org/patch/511801/

I did a compile test with this change on xtensa and it passed. I'll 
repost with the added diff.

+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -46,7 +46,7 @@
 ?#include <linux/timer.h>
 ?#include <linux/io.h>
 ?#include <linux/workqueue.h>
-
+#include <linux/io-64-nonatomic-hi-lo.h>
 ?#include <asm/byteorder.h>

 ?#include <net/net_namespace.h>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 12:48         ` okaya at codeaurora.org
  (?)
@ 2018-03-22 14:40           ` Steve Wise
  -1 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-22 14:40 UTC (permalink / raw)
  To: okaya, 'kbuild test robot'
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, 'Steve Wise', 'Doug Ledford',
	'Jason Gunthorpe', linux-kernel, 'Casey Leedom',
	'Michael Werner'

> 
> On 2018-03-22 08:24, okaya@codeaurora.org wrote:
> > On 2018-03-22 02:44, kbuild test robot wrote:
> >> Hi Sinan,
> >>
> >> Thank you for the patch! Yet something to improve:
> >>
> >> [auto build test ERROR on linus/master]
> >> [also build test ERROR on v4.16-rc6 next-20180321]
> >> [if your patch is applied to the wrong git tree, please drop us a note
> >> to help improve the system]
> >>
> >> url:
> >> https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-
> duplicate-barriers-on-weakly-ordered-archs/20180321-091659
> >> config: xtensa-allyesconfig (attached as .config)
> >> compiler: xtensa-linux-gcc (GCC) 7.2.0
> >> reproduce:
> >>         wget
> >> https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross
> >> -O ~/bin/make.cross
> >>         chmod +x ~/bin/make.cross
> >>         # save the attached .config to linux build tree
> >>         make.cross ARCH=xtensa
> >>
> >
> > Jason,
> >
> > Can you remove the writeq change if it is too late for me to fix?
> >
> > This is an infrastructural issue on xtensa arch.
> >
> > Probably, it won't get fixed today.
> 
> AFAIS, even writeq won't compile on this arch. I started questioning
> this build test.
> 

I think all these iw_cxgb4 changes should be reverted until we really have a
plan for multi-platform that works.  

> 
> >
> > Sinan
> >
> >
> >> All errors (new ones prefixed by >>):
> >>
> >>    In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
> >>                     from drivers/infiniband/hw/cxgb4/device.c:40:
> >>    drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
> >>>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration
> >>>> of function 'writeq_relaxed'; did you mean 'writel_relaxed'?
> >>>> [-Werror=implicit-function-declaration]
> >>       writeq_relaxed(*src, dst);
> >>       ^~~~~~~~~~~~~~
> >>       writel_relaxed
> >>    cc1: some warnings being treated as errors
> >>
> >> vim +460 drivers/infiniband/hw/cxgb4/t4.h
> >>
> >>    450
> >>    451	/* This function copies 64 byte coalesced work request to
> >> memory
> >>    452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches
> data
> >>    453	 * from the FIFO instead of from Host.
> >>    454	 */
> >>    455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
> >>    456	{
> >>    457		int count = 8;
> >>    458
> >>    459		while (count) {
> >>  > 460			writeq_relaxed(*src, dst);
> >>    461			src++;
> >>    462			dst++;
> >>    463			count--;
> >>    464		}
> >>    465	}
> >>    466
> >>
> >> ---
> >> 0-DAY kernel test infrastructure                Open Source Technology
> >> Center
> >> https://lists.01.org/pipermail/kbuild-all                   Intel
> >> Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 14:40           ` Steve Wise
  0 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-22 14:40 UTC (permalink / raw)
  To: okaya, 'kbuild test robot'
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, 'Steve Wise', 'Doug Ledford',
	'Jason Gunthorpe', linux-kernel, 'Casey Leedom',
	'Michael Werner'

> 
> On 2018-03-22 08:24, okaya@codeaurora.org wrote:
> > On 2018-03-22 02:44, kbuild test robot wrote:
> >> Hi Sinan,
> >>
> >> Thank you for the patch! Yet something to improve:
> >>
> >> [auto build test ERROR on linus/master]
> >> [also build test ERROR on v4.16-rc6 next-20180321]
> >> [if your patch is applied to the wrong git tree, please drop us a note
> >> to help improve the system]
> >>
> >> url:
> >> https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-
> duplicate-barriers-on-weakly-ordered-archs/20180321-091659
> >> config: xtensa-allyesconfig (attached as .config)
> >> compiler: xtensa-linux-gcc (GCC) 7.2.0
> >> reproduce:
> >>         wget
> >> https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross
> >> -O ~/bin/make.cross
> >>         chmod +x ~/bin/make.cross
> >>         # save the attached .config to linux build tree
> >>         make.cross ARCH=xtensa
> >>
> >
> > Jason,
> >
> > Can you remove the writeq change if it is too late for me to fix?
> >
> > This is an infrastructural issue on xtensa arch.
> >
> > Probably, it won't get fixed today.
> 
> AFAIS, even writeq won't compile on this arch. I started questioning
> this build test.
> 

I think all these iw_cxgb4 changes should be reverted until we really have a
plan for multi-platform that works.  

> 
> >
> > Sinan
> >
> >
> >> All errors (new ones prefixed by >>):
> >>
> >>    In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
> >>                     from drivers/infiniband/hw/cxgb4/device.c:40:
> >>    drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
> >>>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration
> >>>> of function 'writeq_relaxed'; did you mean 'writel_relaxed'?
> >>>> [-Werror=implicit-function-declaration]
> >>       writeq_relaxed(*src, dst);
> >>       ^~~~~~~~~~~~~~
> >>       writel_relaxed
> >>    cc1: some warnings being treated as errors
> >>
> >> vim +460 drivers/infiniband/hw/cxgb4/t4.h
> >>
> >>    450
> >>    451	/* This function copies 64 byte coalesced work request to
> >> memory
> >>    452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches
> data
> >>    453	 * from the FIFO instead of from Host.
> >>    454	 */
> >>    455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
> >>    456	{
> >>    457		int count = 8;
> >>    458
> >>    459		while (count) {
> >>  > 460			writeq_relaxed(*src, dst);
> >>    461			src++;
> >>    462			dst++;
> >>    463			count--;
> >>    464		}
> >>    465	}
> >>    466
> >>
> >> ---
> >> 0-DAY kernel test infrastructure                Open Source Technology
> >> Center
> >> https://lists.01.org/pipermail/kbuild-all                   Intel
> >> Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 14:40           ` Steve Wise
  0 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-22 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

> 
> On 2018-03-22 08:24, okaya at codeaurora.org wrote:
> > On 2018-03-22 02:44, kbuild test robot wrote:
> >> Hi Sinan,
> >>
> >> Thank you for the patch! Yet something to improve:
> >>
> >> [auto build test ERROR on linus/master]
> >> [also build test ERROR on v4.16-rc6 next-20180321]
> >> [if your patch is applied to the wrong git tree, please drop us a note
> >> to help improve the system]
> >>
> >> url:
> >> https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-
> duplicate-barriers-on-weakly-ordered-archs/20180321-091659
> >> config: xtensa-allyesconfig (attached as .config)
> >> compiler: xtensa-linux-gcc (GCC) 7.2.0
> >> reproduce:
> >>         wget
> >> https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross
> >> -O ~/bin/make.cross
> >>         chmod +x ~/bin/make.cross
> >>         # save the attached .config to linux build tree
> >>         make.cross ARCH=xtensa
> >>
> >
> > Jason,
> >
> > Can you remove the writeq change if it is too late for me to fix?
> >
> > This is an infrastructural issue on xtensa arch.
> >
> > Probably, it won't get fixed today.
> 
> AFAIS, even writeq won't compile on this arch. I started questioning
> this build test.
> 

I think all these iw_cxgb4 changes should be reverted until we really have a
plan for multi-platform that works.  

> 
> >
> > Sinan
> >
> >
> >> All errors (new ones prefixed by >>):
> >>
> >>    In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
> >>                     from drivers/infiniband/hw/cxgb4/device.c:40:
> >>    drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
> >>>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration
> >>>> of function 'writeq_relaxed'; did you mean 'writel_relaxed'?
> >>>> [-Werror=implicit-function-declaration]
> >>       writeq_relaxed(*src, dst);
> >>       ^~~~~~~~~~~~~~
> >>       writel_relaxed
> >>    cc1: some warnings being treated as errors
> >>
> >> vim +460 drivers/infiniband/hw/cxgb4/t4.h
> >>
> >>    450
> >>    451	/* This function copies 64 byte coalesced work request to
> >> memory
> >>    452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches
> data
> >>    453	 * from the FIFO instead of from Host.
> >>    454	 */
> >>    455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
> >>    456	{
> >>    457		int count = 8;
> >>    458
> >>    459		while (count) {
> >>  > 460			writeq_relaxed(*src, dst);
> >>    461			src++;
> >>    462			dst++;
> >>    463			count--;
> >>    464		}
> >>    465	}
> >>    466
> >>
> >> ---
> >> 0-DAY kernel test infrastructure                Open Source Technology
> >> Center
> >> https://lists.01.org/pipermail/kbuild-all                   Intel
> >> Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 14:40           ` Steve Wise
@ 2018-03-22 14:52             ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-22 14:52 UTC (permalink / raw)
  To: Steve Wise, 'kbuild test robot'
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, 'Steve Wise', 'Doug Ledford',
	'Jason Gunthorpe', linux-kernel, 'Casey Leedom',
	'Michael Werner'

On 3/22/2018 9:40 AM, Steve Wise wrote:
> I think all these iw_cxgb4 changes should be reverted until we really have a
> plan for multi-platform that works.  

I know you are looking to have support for PowerPC. 

Isn't this a PowerPC problem? Why penalize other architectures?

Do you see anything wrong with the code itself?

I started this thread with the PowerPC develoeprs on your request.
"RFC on writel and writel_relaxed"

They are looking into adding the relaxed API support. Support can come
in later. Why block this change now?

benh@kernel.crashing.org:
"I've been wanting to implement the relaxed accessors for a while but
was battling with this to try to also better support WC, and due to
other commitments, this somewhat fell down the cracks."

I have seen four different responses on this thread. Since this is an
architecture change it will take a while to get the semantics right.
It won't happen in the new few days.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 14:52             ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-22 14:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/22/2018 9:40 AM, Steve Wise wrote:
> I think all these iw_cxgb4 changes should be reverted until we really have a
> plan for multi-platform that works.  

I know you are looking to have support for PowerPC. 

Isn't this a PowerPC problem? Why penalize other architectures?

Do you see anything wrong with the code itself?

I started this thread with the PowerPC develoeprs on your request.
"RFC on writel and writel_relaxed"

They are looking into adding the relaxed API support. Support can come
in later. Why block this change now?

benh at kernel.crashing.org:
"I've been wanting to implement the relaxed accessors for a while but
was battling with this to try to also better support WC, and due to
other commitments, this somewhat fell down the cracks."

I have seen four different responses on this thread. Since this is an
architecture change it will take a while to get the semantics right.
It won't happen in the new few days.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 14:52             ` Sinan Kaya
  (?)
@ 2018-03-22 16:28               ` Steve Wise
  -1 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-22 16:28 UTC (permalink / raw)
  To: Sinan Kaya, 'kbuild test robot'
  Cc: sulrich, 'Casey Leedom', 'Steve Wise',
	linux-rdma, linux-arm-msm, timur, linux-kernel,
	'Jason Gunthorpe', 'Doug Ledford',
	kbuild-all, 'Michael Werner',
	linux-arm-kernel



On 3/22/2018 9:52 AM, Sinan Kaya wrote:
> On 3/22/2018 9:40 AM, Steve Wise wrote:
>> I think all these iw_cxgb4 changes should be reverted until we really have a
>> plan for multi-platform that works.  
> I know you are looking to have support for PowerPC. 
>
> Isn't this a PowerPC problem? Why penalize other architectures?
>
> Do you see anything wrong with the code itself?

I worry it breaks PPC.

> I started this thread with the PowerPC develoeprs on your request.
> "RFC on writel and writel_relaxed"
>
> They are looking into adding the relaxed API support. Support can come
> in later. Why block this change now?
>
> benh@kernel.crashing.org:
> "I've been wanting to implement the relaxed accessors for a while but
> was battling with this to try to also better support WC, and due to
> other commitments, this somewhat fell down the cracks."
>
> I have seen four different responses on this thread. Since this is an
> architecture change it will take a while to get the semantics right.
> It won't happen in the new few days.

I appreciate you doing this.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 16:28               ` Steve Wise
  0 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-22 16:28 UTC (permalink / raw)
  To: Sinan Kaya, 'kbuild test robot'
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, 'Steve Wise', 'Doug Ledford',
	'Jason Gunthorpe', linux-kernel, 'Casey Leedom',
	'Michael Werner'



On 3/22/2018 9:52 AM, Sinan Kaya wrote:
> On 3/22/2018 9:40 AM, Steve Wise wrote:
>> I think all these iw_cxgb4 changes should be reverted until we really have a
>> plan for multi-platform that works.  
> I know you are looking to have support for PowerPC. 
>
> Isn't this a PowerPC problem? Why penalize other architectures?
>
> Do you see anything wrong with the code itself?

I worry it breaks PPC.

> I started this thread with the PowerPC develoeprs on your request.
> "RFC on writel and writel_relaxed"
>
> They are looking into adding the relaxed API support. Support can come
> in later. Why block this change now?
>
> benh@kernel.crashing.org:
> "I've been wanting to implement the relaxed accessors for a while but
> was battling with this to try to also better support WC, and due to
> other commitments, this somewhat fell down the cracks."
>
> I have seen four different responses on this thread. Since this is an
> architecture change it will take a while to get the semantics right.
> It won't happen in the new few days.

I appreciate you doing this.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 16:28               ` Steve Wise
  0 siblings, 0 replies; 100+ messages in thread
From: Steve Wise @ 2018-03-22 16:28 UTC (permalink / raw)
  To: linux-arm-kernel



On 3/22/2018 9:52 AM, Sinan Kaya wrote:
> On 3/22/2018 9:40 AM, Steve Wise wrote:
>> I think all these iw_cxgb4 changes should be reverted until we really have a
>> plan for multi-platform that works.  
> I know you are looking to have support for PowerPC. 
>
> Isn't this a PowerPC problem? Why penalize other architectures?
>
> Do you see anything wrong with the code itself?

I worry it breaks PPC.

> I started this thread with the PowerPC develoeprs on your request.
> "RFC on writel and writel_relaxed"
>
> They are looking into adding the relaxed API support. Support can come
> in later. Why block this change now?
>
> benh at kernel.crashing.org:
> "I've been wanting to implement the relaxed accessors for a while but
> was battling with this to try to also better support WC, and due to
> other commitments, this somewhat fell down the cracks."
>
> I have seen four different responses on this thread. Since this is an
> architecture change it will take a while to get the semantics right.
> It won't happen in the new few days.

I appreciate you doing this.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
       [not found]         ` <437ab002-b8db-24aa-583e-0e61d61aaa97@codeaurora.org>
@ 2018-03-22 18:46             ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-22 18:46 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: kbuild test robot, kbuild-all, linux-rdma, timur, sulrich,
	linux-arm-msm, linux-arm-kernel, Steve Wise, Doug Ledford,
	linux-kernel

On Thu, Mar 22, 2018 at 10:28:13AM -0400, Sinan Kaya wrote:
>    On 03/22/2018 08:48 AM, [1]okaya@codeaurora.org wrote:
> 
>      Jason,
>      Can you remove the writeq change if it is too late for me to fix?
>      This is an infrastructural issue on xtensa arch.
>      Probably, it won't get fixed today.

I was able to drop the patch, please resend.

>      AFAIS, even writeq won't compile on this arch. I started questioning
>      this build test.

>    I found out that the solution is this:
> #include <linux/io-64-nonatomic-hi-lo.h>

Yuk, what an ugly API..

>    [2]https://patchwork.ozlabs.org/patch/511801/
>    I did a compile test with this change on xtensa and it passed. I'll
>    repost with the added diff.
>    +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
>    @@ -46,7 +46,7 @@
>     #include <linux/timer.h>
>     #include <linux/io.h>
>     #include <linux/workqueue.h>
>    -
>    +#include <linux/io-64-nonatomic-hi-lo.h>

I think this is the wrong one. I would expect all PCI-E devices should
use lo-hi, eg writes are done in address increasing order on the bus.

This is what the PCI-E spec would require if a write were to be
fragmented so I would expect devices to handle it properly.

Steve? Any idea of specific things for this HW?

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 18:46             ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-22 18:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 22, 2018 at 10:28:13AM -0400, Sinan Kaya wrote:
>    On 03/22/2018 08:48 AM, [1]okaya at codeaurora.org wrote:
> 
>      Jason,
>      Can you remove the writeq change if it is too late for me to fix?
>      This is an infrastructural issue on xtensa arch.
>      Probably, it won't get fixed today.

I was able to drop the patch, please resend.

>      AFAIS, even writeq won't compile on this arch. I started questioning
>      this build test.

>    I found out that the solution is this:
> #include <linux/io-64-nonatomic-hi-lo.h>

Yuk, what an ugly API..

>    [2]https://patchwork.ozlabs.org/patch/511801/
>    I did a compile test with this change on xtensa and it passed. I'll
>    repost with the added diff.
>    +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
>    @@ -46,7 +46,7 @@
>     #include <linux/timer.h>
>     #include <linux/io.h>
>     #include <linux/workqueue.h>
>    -
>    +#include <linux/io-64-nonatomic-hi-lo.h>

I think this is the wrong one. I would expect all PCI-E devices should
use lo-hi, eg writes are done in address increasing order on the bus.

This is what the PCI-E spec would require if a write were to be
fragmented so I would expect devices to handle it properly.

Steve? Any idea of specific things for this HW?

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 12:48         ` okaya at codeaurora.org
@ 2018-03-22 18:48           ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-22 18:48 UTC (permalink / raw)
  To: okaya
  Cc: kbuild test robot, kbuild-all, linux-rdma, timur, sulrich,
	linux-arm-msm, linux-arm-kernel, Steve Wise, Doug Ledford,
	linux-kernel

On Thu, Mar 22, 2018 at 08:48:35AM -0400, okaya@codeaurora.org wrote:
> On 2018-03-22 08:24, okaya@codeaurora.org wrote:
> >On 2018-03-22 02:44, kbuild test robot wrote:
> >>Hi Sinan,
> >>
> >>Thank you for the patch! Yet something to improve:
> >>
> >>[auto build test ERROR on linus/master]
> >>[also build test ERROR on v4.16-rc6 next-20180321]
> >>[if your patch is applied to the wrong git tree, please drop us a note
> >>to help improve the system]
> >>
> >>url:
> >>https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
> >>config: xtensa-allyesconfig (attached as .config)
> >>compiler: xtensa-linux-gcc (GCC) 7.2.0
> >>reproduce:
> >>        wget
> >>https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
> >>-O ~/bin/make.cross
> >>        chmod +x ~/bin/make.cross
> >>        # save the attached .config to linux build tree
> >>        make.cross ARCH=xtensa
> >>
> >
> >Jason,
> >
> >Can you remove the writeq change if it is too late for me to fix?
> >
> >This is an infrastructural issue on xtensa arch.
> >
> >Probably, it won't get fixed today.
> 
> AFAIS, even writeq won't compile on this arch. I started questioning this
> build test.

I have the same confusion. Did you figure out an explanation?

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 18:48           ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-22 18:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 22, 2018 at 08:48:35AM -0400, okaya at codeaurora.org wrote:
> On 2018-03-22 08:24, okaya at codeaurora.org wrote:
> >On 2018-03-22 02:44, kbuild test robot wrote:
> >>Hi Sinan,
> >>
> >>Thank you for the patch! Yet something to improve:
> >>
> >>[auto build test ERROR on linus/master]
> >>[also build test ERROR on v4.16-rc6 next-20180321]
> >>[if your patch is applied to the wrong git tree, please drop us a note
> >>to help improve the system]
> >>
> >>url:
> >>https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
> >>config: xtensa-allyesconfig (attached as .config)
> >>compiler: xtensa-linux-gcc (GCC) 7.2.0
> >>reproduce:
> >>        wget
> >>https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
> >>-O ~/bin/make.cross
> >>        chmod +x ~/bin/make.cross
> >>        # save the attached .config to linux build tree
> >>        make.cross ARCH=xtensa
> >>
> >
> >Jason,
> >
> >Can you remove the writeq change if it is too late for me to fix?
> >
> >This is an infrastructural issue on xtensa arch.
> >
> >Probably, it won't get fixed today.
> 
> AFAIS, even writeq won't compile on this arch. I started questioning this
> build test.

I have the same confusion. Did you figure out an explanation?

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 18:48           ` Jason Gunthorpe
@ 2018-03-22 18:58             ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-22 18:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kbuild test robot, kbuild-all, linux-rdma, timur, sulrich,
	linux-arm-msm, linux-arm-kernel, Steve Wise, Doug Ledford,
	linux-kernel

On 3/22/2018 1:48 PM, Jason Gunthorpe wrote:
>> AFAIS, even writeq won't compile on this arch. I started questioning this
>> build test.
> I have the same confusion. Did you figure out an explanation?

I did a compile test without the relaxed change. It built just fine.
CONFIG_64BIT is also not defined in the .config file from kbuild.
I also dropped this patch on v5 due to Steve's request.

However, I think I still need an answer for 
v6: RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2
I can switch to #include <linux/io-64-nonatomic-lo-hi.h> there.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 18:58             ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-22 18:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/22/2018 1:48 PM, Jason Gunthorpe wrote:
>> AFAIS, even writeq won't compile on this arch. I started questioning this
>> build test.
> I have the same confusion. Did you figure out an explanation?

I did a compile test without the relaxed change. It built just fine.
CONFIG_64BIT is also not defined in the .config file from kbuild.
I also dropped this patch on v5 due to Steve's request.

However, I think I still need an answer for 
v6: RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2
I can switch to #include <linux/io-64-nonatomic-lo-hi.h> there.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 16:28               ` Steve Wise
@ 2018-03-22 19:44                 ` Casey Leedom
  -1 siblings, 0 replies; 100+ messages in thread
From: Casey Leedom @ 2018-03-22 19:44 UTC (permalink / raw)
  To: SWise OGC, Sinan Kaya, 'kbuild test robot'
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Steve Wise, 'Doug Ledford',
	'Jason Gunthorpe',
	linux-kernel, Michael Werner

| From: Steve Wise <swise@opengridcomputing.com>
| Sent: Thursday, March 22, 2018 9:28 AM
|
| | From: Sinan Kaya <okaya@codeaurora.org>
| | Date: Thursday, March 22, 2018 7:52 AM
| |
| | Isn't this a PowerPC problem? Why penalize other architectures?
| 
| I worry it breaks PPC.

And all other architectures.  Aparraently there isn't a formal API
description for writel_relaxed() and Co., nor __raw_writel(), etc.  What I
think we need is a formal semantic definition of exactly what these APIs is
supposed to do and then we can make sure that they all do that.  Till we
have a consistent definition/implementation, trying to use these APIs in
multi-platform code will be a problem.

For instance, and this is merely an example not a prescription of what I'm
talking about:

    writel():
      -- Ensures correct byte ordering.
      -- Ensures no compiler reordering.
      -- Ensures instruction level synchronization with respect
      --   to previous and succeeding reads and writes.

    writel_relaxed():
      -- Ensures correct byte ordering.
      -- Ensures no compiler reordering.
      -- Ensures instruction level synchronization with respect
      --   to previous writes.

    __raw_writel():
      -- Ensures correct byte ordering.
      -- Ensures no compiler reordering.
 
| I appreciate you doing this.

As do I!  I'm just worried that because the API semantic definitions don't
seem to be formalized for writel_relaxed() and Co., we're in danger of
getting wildly different results on one platform or another based on
differring implementation semantics.

Casey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 19:44                 ` Casey Leedom
  0 siblings, 0 replies; 100+ messages in thread
From: Casey Leedom @ 2018-03-22 19:44 UTC (permalink / raw)
  To: linux-arm-kernel

| From: Steve Wise <swise@opengridcomputing.com>
| Sent: Thursday, March 22, 2018 9:28 AM
|
| | From: Sinan Kaya <okaya@codeaurora.org>
| | Date: Thursday, March 22, 2018 7:52 AM
| |
| | Isn't this a PowerPC problem? Why penalize other architectures?
| 
| I worry it breaks PPC.

And all other architectures.  Aparraently there isn't a formal API
description for writel_relaxed() and Co., nor __raw_writel(), etc.  What I
think we need is a formal semantic definition of exactly what these APIs is
supposed to do and then we can make sure that they all do that.  Till we
have a consistent definition/implementation, trying to use these APIs in
multi-platform code will be a problem.

For instance, and this is merely an example not a prescription of what I'm
talking about:

    writel():
      -- Ensures correct byte ordering.
      -- Ensures no compiler reordering.
      -- Ensures instruction level synchronization with respect
      --   to previous and succeeding reads and writes.

    writel_relaxed():
      -- Ensures correct byte ordering.
      -- Ensures no compiler reordering.
      -- Ensures instruction level synchronization with respect
      --   to previous writes.

    __raw_writel():
      -- Ensures correct byte ordering.
      -- Ensures no compiler reordering.
 
| I appreciate you doing this.

As do I!  I'm just worried that because the API semantic definitions don't
seem to be formalized for writel_relaxed() and Co., we're in danger of
getting wildly different results on one platform or another based on
differring implementation semantics.

Casey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 19:44                 ` Casey Leedom
@ 2018-03-22 20:16                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-22 20:16 UTC (permalink / raw)
  To: Casey Leedom
  Cc: SWise OGC, Sinan Kaya, 'kbuild test robot',
	kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Steve Wise, 'Doug Ledford',
	linux-kernel, Michael Werner

On Thu, Mar 22, 2018 at 07:44:51PM +0000, Casey Leedom wrote:
> | From: Steve Wise <swise@opengridcomputing.com>
> | Sent: Thursday, March 22, 2018 9:28 AM
> |
> | | From: Sinan Kaya <okaya@codeaurora.org>
> | | Date: Thursday, March 22, 2018 7:52 AM
> | |
> | | Isn't this a PowerPC problem? Why penalize other architectures?
> | 
> | I worry it breaks PPC.
>
> And all other architectures.  Aparraently there isn't a formal API
> description for writel_relaxed() and Co., nor __raw_writel(), etc.

We have this:

Documentation/memory-barriers.txt lines 2600-2677/3136 85%

 (*) readX_relaxed(), writeX_relaxed()

     These are similar to readX() and writeX(), but provide weaker memory
     ordering guarantees.  Specifically, they do not guarantee ordering with
     respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee
     ordering with respect to LOCK or UNLOCK operations.  If the latter is
     required, an mmiowb() barrier can be used.  Note that relaxed accesses to
     the same peripheral are guaranteed to be ordered with respect to each
     other.

Which basically says they are the same as writel() except they are not
required to be contained by a spinlock, which is the expensive thing
ARM and PPC are doing with the barriers in writel()

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 20:16                   ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-22 20:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 22, 2018 at 07:44:51PM +0000, Casey Leedom wrote:
> | From: Steve Wise <swise@opengridcomputing.com>
> | Sent: Thursday, March 22, 2018 9:28 AM
> |
> | | From: Sinan Kaya <okaya@codeaurora.org>
> | | Date: Thursday, March 22, 2018 7:52 AM
> | |
> | | Isn't this a PowerPC problem? Why penalize other architectures?
> | 
> | I worry it breaks PPC.
>
> And all other architectures.  Aparraently there isn't a formal API
> description for writel_relaxed() and Co., nor __raw_writel(), etc.

We have this:

Documentation/memory-barriers.txt lines 2600-2677/3136 85%

 (*) readX_relaxed(), writeX_relaxed()

     These are similar to readX() and writeX(), but provide weaker memory
     ordering guarantees.  Specifically, they do not guarantee ordering with
     respect to normal memory accesses (e.g. DMA buffers) nor do they guarantee
     ordering with respect to LOCK or UNLOCK operations.  If the latter is
     required, an mmiowb() barrier can be used.  Note that relaxed accesses to
     the same peripheral are guaranteed to be ordered with respect to each
     other.

Which basically says they are the same as writel() except they are not
required to be contained by a spinlock, which is the expensive thing
ARM and PPC are doing with the barriers in writel()

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 20:16                   ` Jason Gunthorpe
@ 2018-03-22 20:45                     ` Casey Leedom
  -1 siblings, 0 replies; 100+ messages in thread
From: Casey Leedom @ 2018-03-22 20:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: SWise OGC, Sinan Kaya, 'kbuild test robot',
	kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Steve Wise, 'Doug Ledford',
	linux-kernel, Michael Werner

Yes, but ...

For instance, I see that the x86 writel() has "memory" in its asm(), which
prevents GCC from reordering generated instructions.  And it ~looks like~
arm64 ~sort of~ gets that with the inclusion of __iowmb() (which translates
to wmb() then dsb(st) which finally holds the GCC "memory" barrier).  Is
this part of the documented semantic of the writel_relaxed()?  The PowerPC
stuff simply defines writel_relaxed() as writel() and I can't find the
bottom of that Rabbit Hole ...

I'm guessing~ that this line in the documentation ~may~ imply the GCC
ordering:

     ... Note that relaxed accesses to
     the same peripheral are guaranteed to be ordered with respect to each
     other. ...

In any case, we really only have a few places where we (the various Chelsio
drivers) need to worry about this: the "Fast Paths" where we have a lot of
I/O to the device.  I think we should leave everything else alone.

Casey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 20:45                     ` Casey Leedom
  0 siblings, 0 replies; 100+ messages in thread
From: Casey Leedom @ 2018-03-22 20:45 UTC (permalink / raw)
  To: linux-arm-kernel

Yes, but ...

For instance, I see that the x86 writel() has "memory" in its asm(), which
prevents GCC from reordering generated instructions.  And it ~looks like~
arm64 ~sort of~ gets that with the inclusion of __iowmb() (which translates
to wmb() then dsb(st) which finally holds the GCC "memory" barrier).  Is
this part of the documented semantic of the writel_relaxed()?  The PowerPC
stuff simply defines writel_relaxed() as writel() and I can't find the
bottom of that Rabbit Hole ...

I'm guessing~ that this line in the documentation ~may~ imply the GCC
ordering:

     ... Note that relaxed accesses to
     the same peripheral are guaranteed to be ordered with respect to each
     other. ...

In any case, we really only have a few places where we (the various Chelsio
drivers) need to worry about this: the "Fast Paths" where we have a lot of
I/O to the device.  I think we should leave everything else alone.

Casey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 20:45                     ` Casey Leedom
@ 2018-03-22 21:25                       ` Jason Gunthorpe
  -1 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-22 21:25 UTC (permalink / raw)
  To: Casey Leedom
  Cc: SWise OGC, Sinan Kaya, 'kbuild test robot',
	kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Steve Wise, 'Doug Ledford',
	linux-kernel, Michael Werner

On Thu, Mar 22, 2018 at 08:45:11PM +0000, Casey Leedom wrote:

> I'm guessing~ that this line in the documentation ~may~ imply the GCC
> ordering:
> 
>      ... Note that relaxed accesses to
>      the same peripheral are guaranteed to be ordered with respect to each
>      other. ...

An arch can't guarentee "ordered with respect to each other" without
preventing the compiler from re-ordering, so yes, any correct
implementation of writel_relaxed must prevent compiler re-ordering.

eg with volatile or a compiler barrier, or whatever.

> In any case, we really only have a few places where we (the various Chelsio
> drivers) need to worry about this: the "Fast Paths" where we have a lot of
> I/O to the device.  I think we should leave everything else alone.

Yes.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 21:25                       ` Jason Gunthorpe
  0 siblings, 0 replies; 100+ messages in thread
From: Jason Gunthorpe @ 2018-03-22 21:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 22, 2018 at 08:45:11PM +0000, Casey Leedom wrote:

> I'm guessing~ that this line in the documentation ~may~ imply the GCC
> ordering:
> 
>      ... Note that relaxed accesses to
>      the same peripheral are guaranteed to be ordered with respect to each
>      other. ...

An arch can't guarentee "ordered with respect to each other" without
preventing the compiler from re-ordering, so yes, any correct
implementation of writel_relaxed must prevent compiler re-ordering.

eg with volatile or a compiler barrier, or whatever.

> In any case, we really only have a few places where we (the various Chelsio
> drivers) need to worry about this: the "Fast Paths" where we have a lot of
> I/O to the device.  I think we should leave everything else alone.

Yes.

Jason

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 20:45                     ` Casey Leedom
@ 2018-03-22 21:27                       ` Sinan Kaya
  -1 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-22 21:27 UTC (permalink / raw)
  To: Casey Leedom, Jason Gunthorpe
  Cc: SWise OGC, 'kbuild test robot',
	kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Steve Wise, 'Doug Ledford',
	linux-kernel, Michael Werner

On 3/22/2018 4:45 PM, Casey Leedom wrote:
> Yes, but ...
> 
> For instance, I see that the x86 writel() has "memory" in its asm(), which
> prevents GCC from reordering generated instructions.  And it ~looks like~
> arm64 ~sort of~ gets that with the inclusion of __iowmb() (which translates
> to wmb() then dsb(st) which finally holds the GCC "memory" barrier).  Is
> this part of the documented semantic of the writel_relaxed()?  The PowerPC
> stuff simply defines writel_relaxed() as writel() and I can't find the
> bottom of that Rabbit Hole ...
> 

This is changing. See "RFC on writel and writel_relaxed" thread. PowerPC 
maintainers are looking for a way to implement this.

What matters is the description in the barriers document. See also 
section "MMIO access primitives" here about mmiowb()

https://lwn.net/Articles/697539/


> I'm guessing~ that this line in the documentation ~may~ imply the GCC
> ordering:
> 
>      ... Note that relaxed accesses to
>      the same peripheral are guaranteed to be ordered with respect to each
>      other. ...
> 

This can be a compiler barrier for some arches and/or can be architecturally 
guaranteed as in ARM64's device nGnRE mapping (non-gathering non-reordering with
early acknowledgment).

Both writel() and writel_relaxed() need to guarantee ordering with respect to
what HW observes for writes. 

They have different guarantees regarding the code surrounding write like you
identified.

> In any case, we really only have a few places where we (the various Chelsio
> drivers) need to worry about this: the "Fast Paths" where we have a lot of
> I/O to the device.  I think we should leave everything else alone.

makes sense

> 
> Casey
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 21:27                       ` Sinan Kaya
  0 siblings, 0 replies; 100+ messages in thread
From: Sinan Kaya @ 2018-03-22 21:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/22/2018 4:45 PM, Casey Leedom wrote:
> Yes, but ...
> 
> For instance, I see that the x86 writel() has "memory" in its asm(), which
> prevents GCC from reordering generated instructions.  And it ~looks like~
> arm64 ~sort of~ gets that with the inclusion of __iowmb() (which translates
> to wmb() then dsb(st) which finally holds the GCC "memory" barrier).  Is
> this part of the documented semantic of the writel_relaxed()?  The PowerPC
> stuff simply defines writel_relaxed() as writel() and I can't find the
> bottom of that Rabbit Hole ...
> 

This is changing. See "RFC on writel and writel_relaxed" thread. PowerPC 
maintainers are looking for a way to implement this.

What matters is the description in the barriers document. See also 
section "MMIO access primitives" here about mmiowb()

https://lwn.net/Articles/697539/


> I'm guessing~ that this line in the documentation ~may~ imply the GCC
> ordering:
> 
>      ... Note that relaxed accesses to
>      the same peripheral are guaranteed to be ordered with respect to each
>      other. ...
> 

This can be a compiler barrier for some arches and/or can be architecturally 
guaranteed as in ARM64's device nGnRE mapping (non-gathering non-reordering with
early acknowledgment).

Both writel() and writel_relaxed() need to guarantee ordering with respect to
what HW observes for writes. 

They have different guarantees regarding the code surrounding write like you
identified.

> In any case, we really only have a few places where we (the various Chelsio
> drivers) need to worry about this: the "Fast Paths" where we have a lot of
> I/O to the device.  I think we should leave everything else alone.

makes sense

> 
> Casey
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 21:27                       ` Sinan Kaya
@ 2018-03-22 22:02                         ` Casey Leedom
  -1 siblings, 0 replies; 100+ messages in thread
From: Casey Leedom @ 2018-03-22 22:02 UTC (permalink / raw)
  To: Sinan Kaya, Jason Gunthorpe
  Cc: SWise OGC, 'kbuild test robot',
	kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Steve Wise, 'Doug Ledford',
	linux-kernel, Michael Werner

  Okay, thanks Sinan.  I ~think~ we're on the same page here.

  Our guy Michael Werner is carefully going through our drivers with Steve
Wise.  I'd like to let them work on the changes with a lot of thought and
testing before diving in too far.

  We had thought that we could get around the lack of a real Relaxed
Ordering Write on the PowerPC via the __raw_write*() API, but that isn't
really part of the defined API and various platforms define that API
differently (some doing PCI Byte Ordering rearrangements and some not).  I
think that we're going to have to work with the PowerPC platform folks to
get a real Relaxed Ordering Write.

Casey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-22 22:02                         ` Casey Leedom
  0 siblings, 0 replies; 100+ messages in thread
From: Casey Leedom @ 2018-03-22 22:02 UTC (permalink / raw)
  To: linux-arm-kernel

  Okay, thanks Sinan.  I ~think~ we're on the same page here.

  Our guy Michael Werner is carefully going through our drivers with Steve
Wise.  I'd like to let them work on the changes with a lot of thought and
testing before diving in too far.

  We had thought that we could get around the lack of a real Relaxed
Ordering Write on the PowerPC via the __raw_write*() API, but that isn't
really part of the defined API and various platforms define that API
differently (some doing PCI Byte Ordering rearrangements and some not).  I
think that we're going to have to work with the PowerPC platform folks to
get a real Relaxed Ordering Write.

Casey

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-20  2:47   ` Sinan Kaya
  (?)
@ 2018-03-23  4:14     ` kbuild test robot
  -1 siblings, 0 replies; 100+ messages in thread
From: kbuild test robot @ 2018-03-23  4:14 UTC (permalink / raw)
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Sinan Kaya, Steve Wise, Doug Ledford,
	Jason Gunthorpe, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2220 bytes --]

Hi Sinan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6 next-20180322]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
                    from drivers/infiniband/hw/cxgb4/device.c:40:
   drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'write_pnet'? [-Werror=implicit-function-declaration]
      writeq_relaxed(*src, dst);
      ^~~~~~~~~~~~~~
      write_pnet
   cc1: some warnings being treated as errors
--
   In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
                    from drivers/infiniband/hw/cxgb4/cm.c:53:
   drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'rt6_release'? [-Werror=implicit-function-declaration]
      writeq_relaxed(*src, dst);
      ^~~~~~~~~~~~~~
      rt6_release
   cc1: some warnings being treated as errors

vim +460 drivers/infiniband/hw/cxgb4/t4.h

   450	
   451	/* This function copies 64 byte coalesced work request to memory
   452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
   453	 * from the FIFO instead of from Host.
   454	 */
   455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
   456	{
   457		int count = 8;
   458	
   459		while (count) {
 > 460			writeq_relaxed(*src, dst);
   461			src++;
   462			dst++;
   463			count--;
   464		}
   465	}
   466	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 62176 bytes --]

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23  4:14     ` kbuild test robot
  0 siblings, 0 replies; 100+ messages in thread
From: kbuild test robot @ 2018-03-23  4:14 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: kbuild-all, linux-rdma, timur, sulrich, linux-arm-msm,
	linux-arm-kernel, Sinan Kaya, Steve Wise, Doug Ledford,
	Jason Gunthorpe, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2220 bytes --]

Hi Sinan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6 next-20180322]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
                    from drivers/infiniband/hw/cxgb4/device.c:40:
   drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'write_pnet'? [-Werror=implicit-function-declaration]
      writeq_relaxed(*src, dst);
      ^~~~~~~~~~~~~~
      write_pnet
   cc1: some warnings being treated as errors
--
   In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
                    from drivers/infiniband/hw/cxgb4/cm.c:53:
   drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'rt6_release'? [-Werror=implicit-function-declaration]
      writeq_relaxed(*src, dst);
      ^~~~~~~~~~~~~~
      rt6_release
   cc1: some warnings being treated as errors

vim +460 drivers/infiniband/hw/cxgb4/t4.h

   450	
   451	/* This function copies 64 byte coalesced work request to memory
   452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
   453	 * from the FIFO instead of from Host.
   454	 */
   455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
   456	{
   457		int count = 8;
   458	
   459		while (count) {
 > 460			writeq_relaxed(*src, dst);
   461			src++;
   462			dst++;
   463			count--;
   464		}
   465	}
   466	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 62176 bytes --]

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [PATCH v4 4/6] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23  4:14     ` kbuild test robot
  0 siblings, 0 replies; 100+ messages in thread
From: kbuild test robot @ 2018-03-23  4:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sinan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6 next-20180322]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Sinan-Kaya/ib-Eliminate-duplicate-barriers-on-weakly-ordered-archs/20180321-091659
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
                    from drivers/infiniband/hw/cxgb4/device.c:40:
   drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'write_pnet'? [-Werror=implicit-function-declaration]
      writeq_relaxed(*src, dst);
      ^~~~~~~~~~~~~~
      write_pnet
   cc1: some warnings being treated as errors
--
   In file included from drivers/infiniband/hw/cxgb4/iw_cxgb4.h:73:0,
                    from drivers/infiniband/hw/cxgb4/cm.c:53:
   drivers/infiniband/hw/cxgb4/t4.h: In function 'pio_copy':
>> drivers/infiniband/hw/cxgb4/t4.h:460:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'rt6_release'? [-Werror=implicit-function-declaration]
      writeq_relaxed(*src, dst);
      ^~~~~~~~~~~~~~
      rt6_release
   cc1: some warnings being treated as errors

vim +460 drivers/infiniband/hw/cxgb4/t4.h

   450	
   451	/* This function copies 64 byte coalesced work request to memory
   452	 * mapped BAR2 space. For coalesced WRs, the SGE fetches data
   453	 * from the FIFO instead of from Host.
   454	 */
   455	static inline void pio_copy(u64 __iomem *dst, u64 *src)
   456	{
   457		int count = 8;
   458	
   459		while (count) {
 > 460			writeq_relaxed(*src, dst);
   461			src++;
   462			dst++;
   463			count--;
   464		}
   465	}
   466	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 62176 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180323/6e4cc2e2/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2018-03-23  4:14 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20  2:47 [PATCH v4 0/6] ib: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
2018-03-20  2:47 ` Sinan Kaya
2018-03-20  2:47 ` [PATCH v4 1/6] RDMA/bnxt_re: " Sinan Kaya
2018-03-20  2:47   ` Sinan Kaya
2018-03-20 14:48   ` Jason Gunthorpe
2018-03-20 14:48     ` Jason Gunthorpe
2018-03-20 15:00     ` Sinan Kaya
2018-03-20 15:00       ` Sinan Kaya
2018-03-20 15:08       ` Sinan Kaya
2018-03-20 15:08         ` Sinan Kaya
2018-03-20 15:23         ` Jason Gunthorpe
2018-03-20 15:23           ` Jason Gunthorpe
2018-03-20 15:20       ` Jason Gunthorpe
2018-03-20 15:20         ` Jason Gunthorpe
2018-03-20 15:30         ` Sinan Kaya
2018-03-20 15:30           ` Sinan Kaya
2018-03-20 16:02           ` Jason Gunthorpe
2018-03-20 16:02             ` Jason Gunthorpe
2018-03-20  2:47 ` [PATCH v4 2/6] IB/mlx4: " Sinan Kaya
2018-03-20  2:47   ` Sinan Kaya
2018-03-20 14:48   ` Jason Gunthorpe
2018-03-20 14:48     ` Jason Gunthorpe
2018-03-20  2:47 ` [PATCH v4 3/6] RDMA/i40iw: " Sinan Kaya
2018-03-20  2:47   ` Sinan Kaya
2018-03-20 14:56   ` Jason Gunthorpe
2018-03-20 14:56     ` Jason Gunthorpe
2018-03-21 13:38   ` Shiraz Saleem
2018-03-21 13:38     ` Shiraz Saleem
2018-03-21 20:02   ` Jason Gunthorpe
2018-03-21 20:02     ` Jason Gunthorpe
2018-03-21 21:01     ` Sinan Kaya
2018-03-21 21:01       ` Sinan Kaya
2018-03-20  2:47 ` [PATCH v4 4/6] infiniband: cxgb4: " Sinan Kaya
2018-03-20  2:47   ` Sinan Kaya
2018-03-20 14:51   ` Jason Gunthorpe
2018-03-20 14:51     ` Jason Gunthorpe
2018-03-20 15:10     ` Steve Wise
2018-03-20 15:10       ` Steve Wise
2018-03-20 15:10       ` Steve Wise
2018-03-20 15:38     ` Steve Wise
2018-03-20 15:38       ` Steve Wise
2018-03-20 15:38       ` Steve Wise
2018-03-22  6:44   ` kbuild test robot
2018-03-22  6:44     ` kbuild test robot
2018-03-22  6:44     ` kbuild test robot
2018-03-22 12:24     ` okaya
2018-03-22 12:24       ` okaya at codeaurora.org
2018-03-22 12:48       ` okaya
2018-03-22 12:48         ` okaya at codeaurora.org
2018-03-22 14:33         ` Sinan Kaya
2018-03-22 14:33           ` Sinan Kaya
2018-03-22 14:40         ` Steve Wise
2018-03-22 14:40           ` Steve Wise
2018-03-22 14:40           ` Steve Wise
2018-03-22 14:52           ` Sinan Kaya
2018-03-22 14:52             ` Sinan Kaya
2018-03-22 16:28             ` Steve Wise
2018-03-22 16:28               ` Steve Wise
2018-03-22 16:28               ` Steve Wise
2018-03-22 19:44               ` Casey Leedom
2018-03-22 19:44                 ` Casey Leedom
2018-03-22 20:16                 ` Jason Gunthorpe
2018-03-22 20:16                   ` Jason Gunthorpe
2018-03-22 20:45                   ` Casey Leedom
2018-03-22 20:45                     ` Casey Leedom
2018-03-22 21:25                     ` Jason Gunthorpe
2018-03-22 21:25                       ` Jason Gunthorpe
2018-03-22 21:27                     ` Sinan Kaya
2018-03-22 21:27                       ` Sinan Kaya
2018-03-22 22:02                       ` Casey Leedom
2018-03-22 22:02                         ` Casey Leedom
     [not found]         ` <437ab002-b8db-24aa-583e-0e61d61aaa97@codeaurora.org>
2018-03-22 18:46           ` Jason Gunthorpe
2018-03-22 18:46             ` Jason Gunthorpe
2018-03-22 18:48         ` Jason Gunthorpe
2018-03-22 18:48           ` Jason Gunthorpe
2018-03-22 18:58           ` Sinan Kaya
2018-03-22 18:58             ` Sinan Kaya
2018-03-23  4:14   ` kbuild test robot
2018-03-23  4:14     ` kbuild test robot
2018-03-23  4:14     ` kbuild test robot
2018-03-20  2:47 ` [PATCH v4 5/6] IB/nes: " Sinan Kaya
2018-03-20  2:47   ` Sinan Kaya
2018-03-20 14:54   ` Jason Gunthorpe
2018-03-20 14:54     ` Jason Gunthorpe
2018-03-20 15:23     ` Sinan Kaya
2018-03-20 15:23       ` Sinan Kaya
2018-03-20 16:01       ` Jason Gunthorpe
2018-03-20 16:01         ` Jason Gunthorpe
2018-03-20 16:08         ` Sinan Kaya
2018-03-20 16:08           ` Sinan Kaya
2018-03-20 16:29           ` Jason Gunthorpe
2018-03-20 16:29             ` Jason Gunthorpe
2018-03-20  2:47 ` [PATCH v4 6/6] RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs #2 Sinan Kaya
2018-03-20  2:47   ` Sinan Kaya
2018-03-20  7:38   ` Kalderon, Michal
2018-03-20  7:38     ` Kalderon, Michal
2018-03-20 14:55   ` Jason Gunthorpe
2018-03-20 14:55     ` Jason Gunthorpe
2018-03-21 20:08 ` [PATCH v4 0/6] ib: Eliminate duplicate barriers on weakly-ordered archs Jason Gunthorpe
2018-03-21 20:08   ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.