linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 1/5] net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs
       [not found] <1521738603-23596-1-git-send-email-okaya@codeaurora.org>
@ 2018-03-22 17:09 ` Sinan Kaya
  2018-03-22 17:09 ` [PATCH v5 2/5] qlcnic: " Sinan Kaya
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Sinan Kaya @ 2018-03-22 17:09 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Dept-GELinuxNICDev,
	linux-kernel

Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c
index 9e5264d..0e71b74 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -1858,8 +1858,8 @@ static void ql_update_small_bufq_prod_index(struct ql3_adapter *qdev)
 			qdev->small_buf_release_cnt -= 8;
 		}
 		wmb();
-		writel(qdev->small_buf_q_producer_index,
-			&port_regs->CommonRegs.rxSmallQProducerIndex);
+		writel_relaxed(qdev->small_buf_q_producer_index,
+			       &port_regs->CommonRegs.rxSmallQProducerIndex);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 2/5] qlcnic: Eliminate duplicate barriers on weakly-ordered archs
       [not found] <1521738603-23596-1-git-send-email-okaya@codeaurora.org>
  2018-03-22 17:09 ` [PATCH v5 1/5] net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
@ 2018-03-22 17:09 ` Sinan Kaya
  2018-03-22 17:10 ` [PATCH v5 3/5] bnx2x: " Sinan Kaya
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Sinan Kaya @ 2018-03-22 17:09 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Harish Patil,
	Manish Chopra, Dept-GELinuxNICDev, linux-kernel

Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Manish Chopra <manish.chopra@cavium.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
index 46b0372..97c146e7 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -478,7 +478,7 @@ irqreturn_t qlcnic_83xx_clear_legacy_intr(struct qlcnic_adapter *adapter)
 	wmb();
 
 	/* clear the interrupt trigger control register */
-	writel(0, adapter->isr_int_vec);
+	writel_relaxed(0, adapter->isr_int_vec);
 	intr_val = readl(adapter->isr_int_vec);
 	do {
 		intr_val = readl(adapter->tgt_status_reg);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
       [not found] <1521738603-23596-1-git-send-email-okaya@codeaurora.org>
  2018-03-22 17:09 ` [PATCH v5 1/5] net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
  2018-03-22 17:09 ` [PATCH v5 2/5] qlcnic: " Sinan Kaya
@ 2018-03-22 17:10 ` Sinan Kaya
  2018-03-23 16:20   ` David Miller
  2018-03-22 17:10 ` [PATCH v5 4/5] net: qlge: " Sinan Kaya
  2018-03-22 17:10 ` [PATCH v5 5/5] bnxt_en: " Sinan Kaya
  4 siblings, 1 reply; 14+ messages in thread
From: Sinan Kaya @ 2018-03-22 17:10 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Ariel Elior,
	everest-linux-l2, linux-kernel

Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h         | 12 ++++++++----
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c     |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h     |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c    |  4 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c   |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c    |  2 +-
 7 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 352beff..d847e1b 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -166,6 +166,12 @@ do {						\
 #define REG_RD8(bp, offset)		readb(REG_ADDR(bp, offset))
 #define REG_RD16(bp, offset)		readw(REG_ADDR(bp, offset))
 
+#define REG_WR_RELAXED(bp, offset, val)	\
+	writel_relaxed((u32)val, REG_ADDR(bp, offset))
+
+#define REG_WR16_RELAXED(bp, offset, val) \
+	writew_relaxed((u16)val, REG_ADDR(bp, offset))
+
 #define REG_WR(bp, offset, val)		writel((u32)val, REG_ADDR(bp, offset))
 #define REG_WR8(bp, offset, val)	writeb((u8)val, REG_ADDR(bp, offset))
 #define REG_WR16(bp, offset, val)	writew((u16)val, REG_ADDR(bp, offset))
@@ -758,10 +764,8 @@ struct bnx2x_fastpath {
 #if (BNX2X_DB_SHIFT < BNX2X_DB_MIN_SHIFT)
 #error "Min DB doorbell stride is 8"
 #endif
-#define DOORBELL(bp, cid, val) \
-	do { \
-		writel((u32)(val), bp->doorbells + (bp->db_size * (cid))); \
-	} while (0)
+#define DOORBELL_RELAXED(bp, cid, val) \
+	writel_relaxed((u32)(val), (bp)->doorbells + ((bp)->db_size * (cid)))
 
 /* TX CSUM helpers */
 #define SKB_CS_OFF(skb)		(offsetof(struct tcphdr, check) - \
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index d7c98e8..172fd79 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4155,7 +4155,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	txdata->tx_db.data.prod += nbd;
 	barrier();
 
-	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
+	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
 
 	mmiowb();
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index a5265e1..a8ce5c5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -522,8 +522,8 @@ static inline void bnx2x_update_rx_prod(struct bnx2x *bp,
 	wmb();
 
 	for (i = 0; i < sizeof(rx_prods)/4; i++)
-		REG_WR(bp, fp->ustorm_rx_prods_offset + i*4,
-		       ((u32 *)&rx_prods)[i]);
+		REG_WR_RELAXED(bp, fp->ustorm_rx_prods_offset + i * 4,
+			       ((u32 *)&rx_prods)[i]);
 
 	mmiowb(); /* keep prod updates ordered */
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 1e33abd..bda5e4f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -2592,7 +2592,7 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
 
 	txdata->tx_db.data.prod += 2;
 	barrier();
-	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
+	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
 
 	mmiowb();
 	barrier();
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 74fc9af..146c40d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -3817,8 +3817,8 @@ static void bnx2x_sp_prod_update(struct bnx2x *bp)
 	 */
 	mb();
 
-	REG_WR16(bp, BAR_XSTRORM_INTMEM + XSTORM_SPQ_PROD_OFFSET(func),
-		 bp->spq_prod_idx);
+	REG_WR16_RELAXED(bp, BAR_XSTRORM_INTMEM + XSTORM_SPQ_PROD_OFFSET(func),
+			 bp->spq_prod_idx);
 	mmiowb();
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index ffa7959..40e55d8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -105,7 +105,7 @@ static void bnx2x_vf_igu_ack_sb(struct bnx2x *bp, struct bnx2x_virtf *vf,
 
 	DP(NETIF_MSG_HW, "write 0x%08x to IGU(via GRC) addr 0x%x\n",
 	   ctl, igu_addr_ctl);
-	REG_WR(bp, igu_addr_ctl, ctl);
+	REG_WR_RELAXED(bp, igu_addr_ctl, ctl);
 	mmiowb();
 	barrier();
 }
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 76a4668..3b2f1bd 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -170,7 +170,7 @@ static int bnx2x_send_msg2pf(struct bnx2x *bp, u8 *done, dma_addr_t msg_mapping)
 	wmb();
 
 	/* Trigger the PF FW */
-	writeb(1, &zone_data->trigger.vf_pf_channel.addr_valid);
+	writeb_relaxed(1, &zone_data->trigger.vf_pf_channel.addr_valid);
 
 	/* Wait for PF to complete */
 	while ((tout >= 0) && (!*done)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 4/5] net: qlge: Eliminate duplicate barriers on weakly-ordered archs
       [not found] <1521738603-23596-1-git-send-email-okaya@codeaurora.org>
                   ` (2 preceding siblings ...)
  2018-03-22 17:10 ` [PATCH v5 3/5] bnx2x: " Sinan Kaya
@ 2018-03-22 17:10 ` Sinan Kaya
  2018-03-22 17:10 ` [PATCH v5 5/5] bnxt_en: " Sinan Kaya
  4 siblings, 0 replies; 14+ messages in thread
From: Sinan Kaya @ 2018-03-22 17:10 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Harish Patil,
	Manish Chopra, Dept-GELinuxNICDev, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/qlogic/qlge/qlge.h      | 16 ++++++++++++++++
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |  3 ++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/net/ethernet/qlogic/qlge/qlge.h
index 84ac50f..3e71b65 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -2185,6 +2185,22 @@ static inline void ql_write_db_reg(u32 val, void __iomem *addr)
 }
 
 /*
+ * Doorbell Registers:
+ * Doorbell registers are virtual registers in the PCI memory space.
+ * The space is allocated by the chip during PCI initialization.  The
+ * device driver finds the doorbell address in BAR 3 in PCI config space.
+ * The registers are used to control outbound and inbound queues. For
+ * example, the producer index for an outbound queue.  Each queue uses
+ * 1 4k chunk of memory.  The lower half of the space is for outbound
+ * queues. The upper half is for inbound queues.
+ * Caller has to guarantee ordering.
+ */
+static inline void ql_write_db_reg_relaxed(u32 val, void __iomem *addr)
+{
+	writel_relaxed(val, addr);
+}
+
+/*
  * Shadow Registers:
  * Outbound queues have a consumer index that is maintained by the chip.
  * Inbound queues have a producer index that is maintained by the chip.
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 50038d9..8293c202 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2700,7 +2700,8 @@ static netdev_tx_t qlge_send(struct sk_buff *skb, struct net_device *ndev)
 		tx_ring->prod_idx = 0;
 	wmb();
 
-	ql_write_db_reg(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
+	ql_write_db_reg_relaxed(tx_ring->prod_idx, tx_ring->prod_idx_db_reg);
+	mmiowb();
 	netif_printk(qdev, tx_queued, KERN_DEBUG, qdev->ndev,
 		     "tx queued, slot %d, len %d\n",
 		     tx_ring->prod_idx, skb->len);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 5/5] bnxt_en: Eliminate duplicate barriers on weakly-ordered archs
       [not found] <1521738603-23596-1-git-send-email-okaya@codeaurora.org>
                   ` (3 preceding siblings ...)
  2018-03-22 17:10 ` [PATCH v5 4/5] net: qlge: " Sinan Kaya
@ 2018-03-22 17:10 ` Sinan Kaya
  4 siblings, 0 replies; 14+ messages in thread
From: Sinan Kaya @ 2018-03-22 17:10 UTC (permalink / raw)
  To: netdev, timur, sulrich
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Michael Chan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1500243..befb538 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1922,7 +1922,7 @@ static int bnxt_poll_work(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
 		/* Sync BD data before updating doorbell */
 		wmb();
 
-		bnxt_db_write(bp, db, DB_KEY_TX | prod);
+		bnxt_db_write_relaxed(bp, db, DB_KEY_TX | prod);
 	}
 
 	cpr->cp_raw_cons = raw_cons;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 1989c47..5e453b9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1401,6 +1401,15 @@ static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
 		((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask);
 }
 
+/* For TX and RX ring doorbells with no ordering guarantee*/
+static inline void bnxt_db_write_relaxed(struct bnxt *bp, void __iomem *db,
+					 u32 val)
+{
+	writel_relaxed(val, db);
+	if (bp->flags & BNXT_FLAG_DOUBLE_DB)
+		writel_relaxed(val, db);
+}
+
 /* For TX and RX ring doorbells */
 static inline void bnxt_db_write(struct bnxt *bp, void __iomem *db, u32 val)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-22 17:10 ` [PATCH v5 3/5] bnx2x: " Sinan Kaya
@ 2018-03-23 16:20   ` David Miller
  2018-03-23 16:31     ` Sinan Kaya
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2018-03-23 16:20 UTC (permalink / raw)
  To: okaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel

From: Sinan Kaya <okaya@codeaurora.org>
Date: Thu, 22 Mar 2018 13:10:00 -0400

> Code includes wmb() followed by writel(). writel() already has a
> barrier on some architectures like arm64.
 ...
> @@ -4155,7 +4155,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	txdata->tx_db.data.prod += nbd;
>  	barrier();
>  
> -	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
> +	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
>  
>  	mmiowb();
 ...
> @@ -2592,7 +2592,7 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
>  
>  	txdata->tx_db.data.prod += 2;
>  	barrier();
> -	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
> +	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);

These are compiler barriers being used here, not wmb().

Look, if I can't see a clear:

	wmb()
	writel()

sequence in the patch hunks, I am going to keep pushing back on
these changes.

Thank you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 16:20   ` David Miller
@ 2018-03-23 16:31     ` Sinan Kaya
  2018-03-23 16:43       ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Sinan Kaya @ 2018-03-23 16:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel

On 3/23/2018 12:20 PM, David Miller wrote:
> From: Sinan Kaya <okaya@codeaurora.org>
> Date: Thu, 22 Mar 2018 13:10:00 -0400
> 
>> Code includes wmb() followed by writel(). writel() already has a
>> barrier on some architectures like arm64.
>  ...
>> @@ -4155,7 +4155,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>  	txdata->tx_db.data.prod += nbd;
>>  	barrier();
>>  
>> -	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
>> +	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
>>  
>>  	mmiowb();
>  ...
>> @@ -2592,7 +2592,7 @@ static int bnx2x_run_loopback(struct bnx2x *bp, int loopback_mode)
>>  
>>  	txdata->tx_db.data.prod += 2;
>>  	barrier();
>> -	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
>> +	DOORBELL_RELAXED(bp, txdata->cid, txdata->tx_db.raw);
> 
> These are compiler barriers being used here, not wmb().
> 
> Look, if I can't see a clear:
> 
> 	wmb()
> 	writel()
> 
> sequence in the patch hunks, I am going to keep pushing back on
> these changes.

Sorry, you got me confused now.

If you look at the code closer, you'll see this.

	wmb();

	txdata->tx_db.data.prod += nbd;
	barrier();

	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);

and you also asked me to rename DOORBELL to DOORBELL_RELAXED() to make
it obvious that we have a relaxed operator inside the macro.

Did I miss something?

of course, treating barrier() universally as a write barrier is wrong.

> 
> Thank you.
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 16:31     ` Sinan Kaya
@ 2018-03-23 16:43       ` David Miller
  2018-03-23 16:51         ` Sinan Kaya
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2018-03-23 16:43 UTC (permalink / raw)
  To: okaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel

From: Sinan Kaya <okaya@codeaurora.org>
Date: Fri, 23 Mar 2018 12:31:12 -0400

> Sorry, you got me confused now.
> 
> If you look at the code closer, you'll see this.
> 
> 	wmb();
> 
> 	txdata->tx_db.data.prod += nbd;
> 	barrier();
> 
> 	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
> 
> and you also asked me to rename DOORBELL to DOORBELL_RELAXED() to make
> it obvious that we have a relaxed operator inside the macro.

This still doesn't match the stated pattern.

	wmb();
	/* no other memory or I/O or IOMEM operation */
	writel();

There is a write to a producer index there and then no non-compiler
barrier or any kind before the writel().

So, in fact, it might really need that implicit writel() barrier here!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 16:43       ` David Miller
@ 2018-03-23 16:51         ` Sinan Kaya
  2018-03-23 17:04           ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Sinan Kaya @ 2018-03-23 16:51 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel

On 3/23/2018 12:43 PM, David Miller wrote:
> From: Sinan Kaya <okaya@codeaurora.org>
> Date: Fri, 23 Mar 2018 12:31:12 -0400
> 
>> Sorry, you got me confused now.
>>
>> If you look at the code closer, you'll see this.
>>
>> 	wmb();
>>
>> 	txdata->tx_db.data.prod += nbd;
>> 	barrier();
>>
>> 	DOORBELL(bp, txdata->cid, txdata->tx_db.raw);
>>
>> and you also asked me to rename DOORBELL to DOORBELL_RELAXED() to make
>> it obvious that we have a relaxed operator inside the macro.
> 
> This still doesn't match the stated pattern.

I can certainly update the commit text for this or spin into its own
patch to make it obvious.

> 
> 	wmb();
> 	/* no other memory or I/O or IOMEM operation */
> 	writel();
> 
> There is a write to a producer index there and then no non-compiler
> barrier or any kind before the writel().
> 
> So, in fact, it might really need that implicit writel() barrier here!
> 

It could if txdata->tx_db was not a union. There is a data dependency
between txdata->tx_db.data.prod and txdata->tx_db.raw. 

So, no reordering.

I can argue that barrier() here is useless in fact.

Anyhow, I'll spin this piece out of this patch so that we pay special
attention with a better description.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 16:51         ` Sinan Kaya
@ 2018-03-23 17:04           ` David Miller
  2018-03-23 17:13             ` Sinan Kaya
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2018-03-23 17:04 UTC (permalink / raw)
  To: okaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel

From: Sinan Kaya <okaya@codeaurora.org>
Date: Fri, 23 Mar 2018 12:51:47 -0400

> It could if txdata->tx_db was not a union. There is a data dependency
> between txdata->tx_db.data.prod and txdata->tx_db.raw. 
> 
> So, no reordering.

I don't see it that way, the code requires that:

 	txdata->tx_db.data.prod += nbd;

is visible before the doorbell update.

barrier() doesn't provide that.

Neither does writel_relaxed().  However plain writel() does.

Therefore the code is only correct as-is, and your change potentially
adds a reordering problem.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 17:04           ` David Miller
@ 2018-03-23 17:13             ` Sinan Kaya
  2018-03-23 17:16               ` David Miller
  2018-03-24 14:30               ` Chopra, Manish
  0 siblings, 2 replies; 14+ messages in thread
From: Sinan Kaya @ 2018-03-23 17:13 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel

On 3/23/2018 1:04 PM, David Miller wrote:
> From: Sinan Kaya <okaya@codeaurora.org>
> Date: Fri, 23 Mar 2018 12:51:47 -0400
> 
>> It could if txdata->tx_db was not a union. There is a data dependency
>> between txdata->tx_db.data.prod and txdata->tx_db.raw. 
>>
>> So, no reordering.
> 
> I don't see it that way, the code requires that:
> 
>  	txdata->tx_db.data.prod += nbd;
> 
> is visible before the doorbell update.> 
> barrier() doesn't provide that.
> 
> Neither does writel_relaxed().  However plain writel() does.

Correct for some architectures including ARM but not correct universally.

writel() just guarantees register read/writes before and after to be
ordered when HW observes it. 

writel() doesn't guarantee that the memory update is visible to the HW
on all architectures.

If you need memory update visibility, that barrier() should have been a
wmb()

A correct multi-arch pattern is

wmb()
writel_relaxed()
mmiowb()

We have decided to drop a similar patch on Infiniband due to incorrect
usage of barrier(). If you feel strongly about it, I can remove the
DOORBELL change.

> 
> Therefore the code is only correct as-is, and your change potentially
> adds a reordering problem.
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 17:13             ` Sinan Kaya
@ 2018-03-23 17:16               ` David Miller
  2018-03-24 14:30               ` Chopra, Manish
  1 sibling, 0 replies; 14+ messages in thread
From: David Miller @ 2018-03-23 17:16 UTC (permalink / raw)
  To: okaya
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	ariel.elior, everest-linux-l2, linux-kernel

From: Sinan Kaya <okaya@codeaurora.org>
Date: Fri, 23 Mar 2018 13:13:46 -0400

> We have decided to drop a similar patch on Infiniband due to incorrect
> usage of barrier(). If you feel strongly about it, I can remove the
> DOORBELL change.

Either remove the DOORBELL change or properly adjust the barrier() to
be a wmb().

Be sure to mention this in the commit message, explicitly.

Thank you.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 17:13             ` Sinan Kaya
  2018-03-23 17:16               ` David Miller
@ 2018-03-24 14:30               ` Chopra, Manish
  2018-03-24 14:57                 ` Sinan Kaya
  1 sibling, 1 reply; 14+ messages in thread
From: Chopra, Manish @ 2018-03-24 14:30 UTC (permalink / raw)
  To: Sinan Kaya, David Miller
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel, Elior,
	Ariel, Dept-Eng Everest Linux L2, linux-kernel

> -----Original Message-----
> From: Sinan Kaya [mailto:okaya@codeaurora.org]
> Sent: Friday, March 23, 2018 10:44 PM
> To: David Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org; timur@codeaurora.org; sulrich@codeaurora.org;
> linux-arm-msm@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Elior,
> Ariel <Ariel.Elior@cavium.com>; Dept-Eng Everest Linux L2 <Dept-
> EngEverestLinuxL2@cavium.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-
> ordered archs
> 
> On 3/23/2018 1:04 PM, David Miller wrote:
> > From: Sinan Kaya <okaya@codeaurora.org>
> > Date: Fri, 23 Mar 2018 12:51:47 -0400
> >
> >> It could if txdata->tx_db was not a union. There is a data dependency
> >> between txdata->tx_db.data.prod and txdata->tx_db.raw.
> >>
> >> So, no reordering.
> >
> > I don't see it that way, the code requires that:
> >
> >  	txdata->tx_db.data.prod += nbd;
> >
> > is visible before the doorbell update.>
> > barrier() doesn't provide that.
> >
> > Neither does writel_relaxed().  However plain writel() does.
> 
> Correct for some architectures including ARM but not correct universally.
> 
> writel() just guarantees register read/writes before and after to be ordered
> when HW observes it.
> 
> writel() doesn't guarantee that the memory update is visible to the HW on all
> architectures.
> 
> If you need memory update visibility, that barrier() should have been a
> wmb()
> 
> A correct multi-arch pattern is
> 
> wmb()
> writel_relaxed()
> mmiowb()
> 

Sinan,  Since you have mentioned the use of mmiowb() here after writel_relaxed().
I believe this is not always correct for all types of IO mapped memory [Specially if IO memory is mapped using write combined (for ex. Ioremap_wc())].
We have a current issue on our NIC (qede) driver on x86 for which the patch is already been sent more than a week ago [Still awaiting to hear from David on that].
where mmiowb() seems to be useless since we use write combined mapped doorbell and mmiowb() just seems to be a compiler barrier() there.
So in order to flush write combined buffer we really need writel_relaxed() followed by a wmb() to synchronize writes among CPU cores.
I think  the correct pattern in such cases (for write combined IO) would have been like below - 

wmb();
writel_relaxed();
wmb(); -> To flush the writes actually.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-24 14:30               ` Chopra, Manish
@ 2018-03-24 14:57                 ` Sinan Kaya
  0 siblings, 0 replies; 14+ messages in thread
From: Sinan Kaya @ 2018-03-24 14:57 UTC (permalink / raw)
  To: Chopra, Manish, David Miller
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel, Elior,
	Ariel, Dept-Eng Everest Linux L2, linux-kernel

On 3/24/2018 10:30 AM, Chopra, Manish wrote:
>> -----Original Message-----
>> From: Sinan Kaya [mailto:okaya@codeaurora.org]
>> Sent: Friday, March 23, 2018 10:44 PM
>> To: David Miller <davem@davemloft.net>
>> Cc: netdev@vger.kernel.org; timur@codeaurora.org; sulrich@codeaurora.org;
>> linux-arm-msm@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Elior,
>> Ariel <Ariel.Elior@cavium.com>; Dept-Eng Everest Linux L2 <Dept-
>> EngEverestLinuxL2@cavium.com>; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v5 3/5] bnx2x: Eliminate duplicate barriers on weakly-
>> ordered archs
>>
>> On 3/23/2018 1:04 PM, David Miller wrote:
>>> From: Sinan Kaya <okaya@codeaurora.org>
>>> Date: Fri, 23 Mar 2018 12:51:47 -0400
>>>
>>>> It could if txdata->tx_db was not a union. There is a data dependency
>>>> between txdata->tx_db.data.prod and txdata->tx_db.raw.
>>>>
>>>> So, no reordering.
>>>
>>> I don't see it that way, the code requires that:
>>>
>>>  	txdata->tx_db.data.prod += nbd;
>>>
>>> is visible before the doorbell update.>
>>> barrier() doesn't provide that.
>>>
>>> Neither does writel_relaxed().  However plain writel() does.
>>
>> Correct for some architectures including ARM but not correct universally.
>>
>> writel() just guarantees register read/writes before and after to be ordered
>> when HW observes it.
>>
>> writel() doesn't guarantee that the memory update is visible to the HW on all
>> architectures.
>>
>> If you need memory update visibility, that barrier() should have been a
>> wmb()
>>
>> A correct multi-arch pattern is
>>
>> wmb()
>> writel_relaxed()
>> mmiowb()
>>
> 
> Sinan,  Since you have mentioned the use of mmiowb() here after writel_relaxed().
> I believe this is not always correct for all types of IO mapped memory [Specially if IO memory is mapped using write combined (for ex. Ioremap_wc())].
> We have a current issue on our NIC (qede) driver on x86 for which the patch is already been sent more than a week ago [Still awaiting to hear from David on that].
> where mmiowb() seems to be useless since we use write combined mapped doorbell and mmiowb() just seems to be a compiler barrier() there.
> So in order to flush write combined buffer we really need writel_relaxed() followed by a wmb() to synchronize writes among CPU cores.
> I think  the correct pattern in such cases (for write combined IO) would have been like below - 
> 
> wmb();
> writel_relaxed();
> wmb(); -> To flush the writes actually.

You actually have good points. It is the same problem with barrier() description above.

The answer really depends on what you are doing/expecting after mmiowb(). If you expect
that some memory content to be observed by HW, you definitely need a wmb() like
you mentioned. 

If you just want writes to be flushed but you don't expect any memory content to be
updated, you need a mmiowb(). 

https://lwn.net/Articles/198988/
"There is mmiowb(), but its real purpose is to enforce ordering between MMIO operations only."

> 
> Thanks.
> 
> 
> 
> 
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-03-24 14:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1521738603-23596-1-git-send-email-okaya@codeaurora.org>
2018-03-22 17:09 ` [PATCH v5 1/5] net: qla3xxx: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
2018-03-22 17:09 ` [PATCH v5 2/5] qlcnic: " Sinan Kaya
2018-03-22 17:10 ` [PATCH v5 3/5] bnx2x: " Sinan Kaya
2018-03-23 16:20   ` David Miller
2018-03-23 16:31     ` Sinan Kaya
2018-03-23 16:43       ` David Miller
2018-03-23 16:51         ` Sinan Kaya
2018-03-23 17:04           ` David Miller
2018-03-23 17:13             ` Sinan Kaya
2018-03-23 17:16               ` David Miller
2018-03-24 14:30               ` Chopra, Manish
2018-03-24 14:57                 ` Sinan Kaya
2018-03-22 17:10 ` [PATCH v5 4/5] net: qlge: " Sinan Kaya
2018-03-22 17:10 ` [PATCH v5 5/5] bnxt_en: " Sinan Kaya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).