All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06 ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel, Sinan Kaya

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

I did a regex search for wmb() followed by writel() in each drivers
directory. I scrubbed the ones I care about in this series.

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.

We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

Feel free to apply patches individually.

Change since 7:

* API clarification with Linus and several architecture folks regarding
writel()

https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html

* removed wmb() in front of writel() at several places.
* remove old IA64 comments regarding ordering.

Sinan Kaya (7):
  i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  ixgbe: eliminate duplicate barriers on weakly-ordered archs
  igbvf: eliminate duplicate barriers on weakly-ordered archs
  igb: eliminate duplicate barriers on weakly-ordered archs
  fm10k: Eliminate duplicate barriers on weakly-ordered archs
  ixgbevf: keep writel() closer to wmb()
  ixgbevf: eliminate duplicate barriers on weakly-ordered archs

 drivers/net/ethernet/intel/fm10k/fm10k_main.c     | 15 ++-----------
 drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 22 ++++++++----------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  8 +------
 drivers/net/ethernet/intel/igb/igb_main.c         | 14 ++----------
 drivers/net/ethernet/intel/igbvf/netdev.c         | 16 +++++---------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 23 ++-----------------
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 27 ++++++++---------------
 8 files changed, 30 insertions(+), 100 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06 ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

I did a regex search for wmb() followed by writel() in each drivers
directory. I scrubbed the ones I care about in this series.

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.

We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

Feel free to apply patches individually.

Change since 7:

* API clarification with Linus and several architecture folks regarding
writel()

https://www.mail-archive.com/netdev at vger.kernel.org/msg225806.html

* removed wmb() in front of writel() at several places.
* remove old IA64 comments regarding ordering.

Sinan Kaya (7):
  i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  ixgbe: eliminate duplicate barriers on weakly-ordered archs
  igbvf: eliminate duplicate barriers on weakly-ordered archs
  igb: eliminate duplicate barriers on weakly-ordered archs
  fm10k: Eliminate duplicate barriers on weakly-ordered archs
  ixgbevf: keep writel() closer to wmb()
  ixgbevf: eliminate duplicate barriers on weakly-ordered archs

 drivers/net/ethernet/intel/fm10k/fm10k_main.c     | 15 ++-----------
 drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 22 ++++++++----------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  8 +------
 drivers/net/ethernet/intel/igb/igb_main.c         | 14 ++----------
 drivers/net/ethernet/intel/igbvf/netdev.c         | 16 +++++---------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 23 ++-----------------
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 27 ++++++++---------------
 8 files changed, 30 insertions(+), 100 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06 ` Sinan Kaya
  (?)
@ 2018-04-02 19:06   ` Sinan Kaya
  -1 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 +++++++++-------------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  8 +-------
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f174c72..1b9fa7a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -186,7 +186,13 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter *fdir_data,
 	/* Mark the data descriptor to be watched */
 	first->next_to_watch = tx_desc;
 
-	writel(tx_ring->next_to_use, tx_ring->tail);
+	writel_relaxed(tx_ring->next_to_use, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
+
 	return 0;
 
 dma_fail:
@@ -1523,12 +1529,6 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	/* update next to alloc since we have filled the ring */
 	rx_ring->next_to_alloc = val;
 
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
-	 */
-	wmb();
 	writel(val, rx_ring->tail);
 }
 
@@ -2274,11 +2274,7 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
 
 static inline void i40e_xdp_ring_update_tail(struct i40e_ring *xdp_ring)
 {
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.
-	 */
-	wmb();
-	writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
+	writel(xdp_ring->next_to_use, xdp_ring->tail);
 }
 
 /**
@@ -3444,7 +3440,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 12bd937..eb5556e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -804,12 +804,6 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	/* update next to alloc since we have filled the ring */
 	rx_ring->next_to_alloc = val;
 
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
-	 */
-	wmb();
 	writel(val, rx_ring->tail);
 }
 
@@ -2379,7 +2373,7 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 +++++++++-------------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  8 +-------
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f174c72..1b9fa7a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -186,7 +186,13 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter *fdir_data,
 	/* Mark the data descriptor to be watched */
 	first->next_to_watch = tx_desc;
 
-	writel(tx_ring->next_to_use, tx_ring->tail);
+	writel_relaxed(tx_ring->next_to_use, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
+
 	return 0;
 
 dma_fail:
@@ -1523,12 +1529,6 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	/* update next to alloc since we have filled the ring */
 	rx_ring->next_to_alloc = val;
 
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
-	 */
-	wmb();
 	writel(val, rx_ring->tail);
 }
 
@@ -2274,11 +2274,7 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
 
 static inline void i40e_xdp_ring_update_tail(struct i40e_ring *xdp_ring)
 {
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.
-	 */
-	wmb();
-	writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
+	writel(xdp_ring->next_to_use, xdp_ring->tail);
 }
 
 /**
@@ -3444,7 +3440,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 12bd937..eb5556e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -804,12 +804,6 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	/* update next to alloc since we have filled the ring */
 	rx_ring->next_to_alloc = val;
 
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
-	 */
-	wmb();
 	writel(val, rx_ring->tail);
 }
 
@@ -2379,7 +2373,7 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: intel-wired-lan

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 +++++++++-------------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  8 +-------
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f174c72..1b9fa7a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -186,7 +186,13 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter *fdir_data,
 	/* Mark the data descriptor to be watched */
 	first->next_to_watch = tx_desc;
 
-	writel(tx_ring->next_to_use, tx_ring->tail);
+	writel_relaxed(tx_ring->next_to_use, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
+
 	return 0;
 
 dma_fail:
@@ -1523,12 +1529,6 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	/* update next to alloc since we have filled the ring */
 	rx_ring->next_to_alloc = val;
 
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
-	 */
-	wmb();
 	writel(val, rx_ring->tail);
 }
 
@@ -2274,11 +2274,7 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
 
 static inline void i40e_xdp_ring_update_tail(struct i40e_ring *xdp_ring)
 {
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.
-	 */
-	wmb();
-	writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
+	writel(xdp_ring->next_to_use, xdp_ring->tail);
 }
 
 /**
@@ -3444,7 +3440,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 12bd937..eb5556e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -804,12 +804,6 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	/* update next to alloc since we have filled the ring */
 	rx_ring->next_to_alloc = val;
 
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
-	 */
-	wmb();
 	writel(val, rx_ring->tail);
 }
 
@@ -2379,7 +2373,7 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06 ` Sinan Kaya
  (?)
  (?)
@ 2018-04-02 19:06   ` Sinan Kaya
  -1 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: sulrich, netdev, timur, linux-kernel, Sinan Kaya,
	intel-wired-lan, linux-arm-msm, linux-arm-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index afadba9..c17924b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1696,12 +1696,6 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
@@ -2467,10 +2461,6 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (xdp_xmit) {
 		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.
-		 */
-		wmb();
 		writel(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
@@ -8080,12 +8070,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/*
-	 * Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -8102,7 +8087,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -10034,10 +10019,6 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	if (unlikely(!ring))
 		return;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.
-	 */
-	wmb();
 	writel(ring->next_to_use, ring->tail);
 
 	return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index afadba9..c17924b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1696,12 +1696,6 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
@@ -2467,10 +2461,6 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (xdp_xmit) {
 		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.
-		 */
-		wmb();
 		writel(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
@@ -8080,12 +8070,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/*
-	 * Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -8102,7 +8087,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -10034,10 +10019,6 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	if (unlikely(!ring))
 		return;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.
-	 */
-	wmb();
 	writel(ring->next_to_use, ring->tail);
 
 	return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index afadba9..c17924b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1696,12 +1696,6 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
@@ -2467,10 +2461,6 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (xdp_xmit) {
 		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.
-		 */
-		wmb();
 		writel(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
@@ -8080,12 +8070,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/*
-	 * Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -8102,7 +8087,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -10034,10 +10019,6 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	if (unlikely(!ring))
 		return;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.
-	 */
-	wmb();
 	writel(ring->next_to_use, ring->tail);
 
 	return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: intel-wired-lan

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index afadba9..c17924b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1696,12 +1696,6 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
@@ -2467,10 +2461,6 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (xdp_xmit) {
 		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.
-		 */
-		wmb();
 		writel(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
@@ -8080,12 +8070,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/*
-	 * Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -8102,7 +8087,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -10034,10 +10019,6 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	if (unlikely(!ring))
 		return;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.
-	 */
-	wmb();
 	writel(ring->next_to_use, ring->tail);
 
 	return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 3/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06 ` Sinan Kaya
  (?)
@ 2018-04-02 19:06   ` Sinan Kaya
  -1 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index e2b7502..d9f186a 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -246,12 +246,6 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		else
 			i--;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		*/
-		wmb();
 		writel(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
@@ -2289,16 +2283,16 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
 	}
 
 	tx_desc->read.cmd_type_len |= cpu_to_le32(adapter->txd_cmd);
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
+
+	/* We use this memory barrier to make certain all of the
+	 * status bits have been updated before next_to_watch is
+	 * written.
 	 */
 	wmb();
 
 	tx_ring->buffer_info[first].next_to_watch = tx_desc;
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tail);
+	writel_relaxed(i, adapter->hw.hw_addr + tx_ring->tail);
 	/* we need this if more than one processor can write to our tail
 	 * at a time, it synchronizes IO on IA64/Altix systems
 	 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 3/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index e2b7502..d9f186a 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -246,12 +246,6 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		else
 			i--;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		*/
-		wmb();
 		writel(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
@@ -2289,16 +2283,16 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
 	}
 
 	tx_desc->read.cmd_type_len |= cpu_to_le32(adapter->txd_cmd);
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
+
+	/* We use this memory barrier to make certain all of the
+	 * status bits have been updated before next_to_watch is
+	 * written.
 	 */
 	wmb();
 
 	tx_ring->buffer_info[first].next_to_watch = tx_desc;
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tail);
+	writel_relaxed(i, adapter->hw.hw_addr + tx_ring->tail);
 	/* we need this if more than one processor can write to our tail
 	 * at a time, it synchronizes IO on IA64/Altix systems
 	 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 3/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: intel-wired-lan

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index e2b7502..d9f186a 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -246,12 +246,6 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		else
 			i--;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		*/
-		wmb();
 		writel(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
@@ -2289,16 +2283,16 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
 	}
 
 	tx_desc->read.cmd_type_len |= cpu_to_le32(adapter->txd_cmd);
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
+
+	/* We use this memory barrier to make certain all of the
+	 * status bits have been updated before next_to_watch is
+	 * written.
 	 */
 	wmb();
 
 	tx_ring->buffer_info[first].next_to_watch = tx_desc;
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tail);
+	writel_relaxed(i, adapter->hw.hw_addr + tx_ring->tail);
 	/* we need this if more than one processor can write to our tail
 	 * at a time, it synchronizes IO on IA64/Altix systems
 	 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 4/7] igb: eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06 ` Sinan Kaya
  (?)
  (?)
@ 2018-04-02 19:06   ` Sinan Kaya
  -1 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: sulrich, netdev, timur, linux-kernel, Sinan Kaya,
	intel-wired-lan, linux-arm-msm, linux-arm-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c1c0bc3..c3f7130 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5652,11 +5652,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -5674,7 +5670,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -8073,12 +8069,6 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 4/7] igb: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c1c0bc3..c3f7130 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5652,11 +5652,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -5674,7 +5670,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -8073,12 +8069,6 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 4/7] igb: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c1c0bc3..c3f7130 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5652,11 +5652,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -5674,7 +5670,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -8073,12 +8069,6 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 4/7] igb: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: intel-wired-lan

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c1c0bc3..c3f7130 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5652,11 +5652,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -5674,7 +5670,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -8073,12 +8069,6 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 5/7] fm10k: Eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06 ` Sinan Kaya
  (?)
@ 2018-04-02 19:06   ` Sinan Kaya
  -1 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index df86070..41e3aaa 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -172,13 +172,6 @@ void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
-
 		/* notify hardware of new descriptors */
 		writel(i, rx_ring->tail);
 	}
@@ -1036,11 +1029,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 	/* record SW timestamp if HW timestamp is not available */
 	skb_tx_timestamp(first->skb);
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -1055,7 +1044,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 5/7] fm10k: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index df86070..41e3aaa 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -172,13 +172,6 @@ void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
-
 		/* notify hardware of new descriptors */
 		writel(i, rx_ring->tail);
 	}
@@ -1036,11 +1029,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 	/* record SW timestamp if HW timestamp is not available */
 	skb_tx_timestamp(first->skb);
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -1055,7 +1044,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 5/7] fm10k: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: intel-wired-lan

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index df86070..41e3aaa 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -172,13 +172,6 @@ void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count)
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
-
 		/* notify hardware of new descriptors */
 		writel(i, rx_ring->tail);
 	}
@@ -1036,11 +1029,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 	/* record SW timestamp if HW timestamp is not available */
 	skb_tx_timestamp(first->skb);
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
+	/* We need this memory barrier to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -1055,7 +1044,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 6/7] ixgbevf: keep writel() closer to wmb()
  2018-04-02 19:06 ` Sinan Kaya
  (?)
@ 2018-04-02 19:06   ` Sinan Kaya
  -1 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 447ce1d..c75ea1f 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -312,11 +312,6 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
-static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
-{
-	writel(value, ring->tail);
-}
-
 #define IXGBEVF_RX_DESC(R, i)	\
 	(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
 #define IXGBEVF_TX_DESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index e3d04f2..757dac6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		 * such as IA-64).
 		 */
 		wmb();
-		ixgbevf_write_tail(rx_ring, i);
+		writel(i, rx_ring->tail);
 	}
 }
 
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		ixgbevf_write_tail(xdp_ring, xdp_ring->next_to_use);
+		writel(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	ixgbevf_write_tail(tx_ring, i);
+	writel(i, tx_ring->tail);
 
 	return;
 dma_error:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 6/7] ixgbevf: keep writel() closer to wmb()
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 447ce1d..c75ea1f 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -312,11 +312,6 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
-static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
-{
-	writel(value, ring->tail);
-}
-
 #define IXGBEVF_RX_DESC(R, i)	\
 	(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
 #define IXGBEVF_TX_DESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index e3d04f2..757dac6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		 * such as IA-64).
 		 */
 		wmb();
-		ixgbevf_write_tail(rx_ring, i);
+		writel(i, rx_ring->tail);
 	}
 }
 
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		ixgbevf_write_tail(xdp_ring, xdp_ring->next_to_use);
+		writel(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	ixgbevf_write_tail(tx_ring, i);
+	writel(i, tx_ring->tail);
 
 	return;
 dma_error:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 6/7] ixgbevf: keep writel() closer to wmb()
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: intel-wired-lan

Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 447ce1d..c75ea1f 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -312,11 +312,6 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
-static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
-{
-	writel(value, ring->tail);
-}
-
 #define IXGBEVF_RX_DESC(R, i)	\
 	(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
 #define IXGBEVF_TX_DESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index e3d04f2..757dac6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		 * such as IA-64).
 		 */
 		wmb();
-		ixgbevf_write_tail(rx_ring, i);
+		writel(i, rx_ring->tail);
 	}
 }
 
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		ixgbevf_write_tail(xdp_ring, xdp_ring->next_to_use);
+		writel(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	ixgbevf_write_tail(tx_ring, i);
+	writel(i, tx_ring->tail);
 
 	return;
 dma_error:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06 ` Sinan Kaya
  (?)
@ 2018-04-02 19:06   ` Sinan Kaya
  -1 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 757dac6..29b71a7 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -719,12 +719,6 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
@@ -1228,10 +1222,6 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		struct ixgbevf_ring *xdp_ring =
 			adapter->xdp_ring[rx_ring->queue_index];
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.
-		 */
-		wmb();
 		writel(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
@@ -3985,11 +3975,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier (wmb) to make certain all of the
+	/* We also need this memory barrier (wmb) to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -4004,7 +3990,12 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	writel(i, tx_ring->tail);
+	writel_relaxed(i, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
 
 	return;
 dma_error:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v8 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 757dac6..29b71a7 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -719,12 +719,6 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
@@ -1228,10 +1222,6 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		struct ixgbevf_ring *xdp_ring =
 			adapter->xdp_ring[rx_ring->queue_index];
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.
-		 */
-		wmb();
 		writel(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
@@ -3985,11 +3975,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier (wmb) to make certain all of the
+	/* We also need this memory barrier (wmb) to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -4004,7 +3990,12 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	writel(i, tx_ring->tail);
+	writel_relaxed(i, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
 
 	return;
 dma_error:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-02 19:06   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-02 19:06 UTC (permalink / raw)
  To: intel-wired-lan

memory-barriers.txt has been updated as follows:

"When using writel(), a prior wmb() is not needed to guarantee that the
cache coherent memory writes have completed before writing to the MMIO
region."

Remove old IA-64 comments in the code along with unneeded wmb() in front
of writel().

There are places in the code where wmb() has been used as a double barrier
for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
such places, keep the wmb() but replace the following writel() with
writel_relaxed() to have a sequence as

wmb()
writel_relaxed()
mmio_wb()

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 757dac6..29b71a7 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -719,12 +719,6 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		/* update next to alloc since we have filled the ring */
 		rx_ring->next_to_alloc = i;
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.  (Only
-		 * applicable for weak-ordered memory model archs,
-		 * such as IA-64).
-		 */
-		wmb();
 		writel(i, rx_ring->tail);
 	}
 }
@@ -1228,10 +1222,6 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		struct ixgbevf_ring *xdp_ring =
 			adapter->xdp_ring[rx_ring->queue_index];
 
-		/* Force memory writes to complete before letting h/w
-		 * know there are new descriptors to fetch.
-		 */
-		wmb();
 		writel(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
@@ -3985,11 +3975,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier (wmb) to make certain all of the
+	/* We also need this memory barrier (wmb) to make certain all of the
 	 * status bits have been updated before next_to_watch is written.
 	 */
 	wmb();
@@ -4004,7 +3990,12 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	writel(i, tx_ring->tail);
+	writel_relaxed(i, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
 
 	return;
 dma_error:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06 ` Sinan Kaya
@ 2018-04-03  2:59   ` Sinan Kaya
  -1 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-03  2:59 UTC (permalink / raw)
  To: jeffrey.t.kirsher, Alexander Duyck
  Cc: netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel

Alex,

On 4/2/2018 3:06 PM, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> I did a regex search for wmb() followed by writel() in each drivers
> directory. I scrubbed the ones I care about in this series.
> 
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
> 
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
> 
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
> 
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
> 
> Feel free to apply patches individually.
> 
> Change since 7:
> 
> * API clarification with Linus and several architecture folks regarding
> writel()
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html
> 
> * removed wmb() in front of writel() at several places.
> * remove old IA64 comments regarding ordering.
> 

What do you think about this version? Did I miss any SMP barriers?


> Sinan Kaya (7):
>   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>   igbvf: eliminate duplicate barriers on weakly-ordered archs
>   igb: eliminate duplicate barriers on weakly-ordered archs
>   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbevf: keep writel() closer to wmb()
>   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
> 
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c     | 15 ++-----------
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 22 ++++++++----------
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  8 +------
>  drivers/net/ethernet/intel/igb/igb_main.c         | 14 ++----------
>  drivers/net/ethernet/intel/igbvf/netdev.c         | 16 +++++---------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 23 ++-----------------
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 27 ++++++++---------------
>  8 files changed, 30 insertions(+), 100 deletions(-)
> 

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-03  2:59   ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-03  2:59 UTC (permalink / raw)
  To: linux-arm-kernel

Alex,

On 4/2/2018 3:06 PM, Sinan Kaya wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> I did a regex search for wmb() followed by writel() in each drivers
> directory. I scrubbed the ones I care about in this series.
> 
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
> 
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
> 
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
> 
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
> 
> Feel free to apply patches individually.
> 
> Change since 7:
> 
> * API clarification with Linus and several architecture folks regarding
> writel()
> 
> https://www.mail-archive.com/netdev at vger.kernel.org/msg225806.html
> 
> * removed wmb() in front of writel() at several places.
> * remove old IA64 comments regarding ordering.
> 

What do you think about this version? Did I miss any SMP barriers?


> Sinan Kaya (7):
>   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>   igbvf: eliminate duplicate barriers on weakly-ordered archs
>   igb: eliminate duplicate barriers on weakly-ordered archs
>   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbevf: keep writel() closer to wmb()
>   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
> 
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c     | 15 ++-----------
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 22 ++++++++----------
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  8 +------
>  drivers/net/ethernet/intel/igb/igb_main.c         | 14 ++----------
>  drivers/net/ethernet/intel/igbvf/netdev.c         | 16 +++++---------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 23 ++-----------------
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 27 ++++++++---------------
>  8 files changed, 30 insertions(+), 100 deletions(-)
> 

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-04-03  2:59   ` Sinan Kaya
  (?)
@ 2018-04-03 17:47     ` Alexander Duyck
  -1 siblings, 0 replies; 40+ messages in thread
From: Alexander Duyck @ 2018-04-03 17:47 UTC (permalink / raw)
  To: Sinan Kaya, intel-wired-lan
  Cc: Jeff Kirsher, Netdev, Timur Tabi, sulrich, linux-arm-msm,
	linux-arm-kernel

On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Alex,
>
> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>> Code includes wmb() followed by writel() in multiple places. writel()
>> already has a barrier on some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>>
>> I did a regex search for wmb() followed by writel() in each drivers
>> directory. I scrubbed the ones I care about in this series.
>>
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
>>
>> We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
>>
>> We start with mostly used ones and hope to increase the coverage over time.
>> It will take a while to cover all drivers.
>>
>> Feel free to apply patches individually.
>>
>> Change since 7:
>>
>> * API clarification with Linus and several architecture folks regarding
>> writel()
>>
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html
>>
>> * removed wmb() in front of writel() at several places.
>> * remove old IA64 comments regarding ordering.
>>
>
> What do you think about this version? Did I miss any SMP barriers?

I would say we should probably just drop the whole set and start over.
If we don't need the wmb() we are going to need to go through and
clean up all of these paths and consider the barriers when updating
the layout of the code.

For example I have been thinking about it and in the case of x86 we
are probably better off not bothering with the wmb() and
writel_relaxed() and instead switch over to the smp_wmb() and writel()
since in the case of a strongly ordered system like x86 or sparc this
ends up translating out to a couple of compile barriers.

I will also need time to reevaluate the Rx paths since dropping the
wmb() may have other effects which I need to verify.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-03 17:47     ` Alexander Duyck
  0 siblings, 0 replies; 40+ messages in thread
From: Alexander Duyck @ 2018-04-03 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Alex,
>
> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>> Code includes wmb() followed by writel() in multiple places. writel()
>> already has a barrier on some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>>
>> I did a regex search for wmb() followed by writel() in each drivers
>> directory. I scrubbed the ones I care about in this series.
>>
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
>>
>> We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
>>
>> We start with mostly used ones and hope to increase the coverage over time.
>> It will take a while to cover all drivers.
>>
>> Feel free to apply patches individually.
>>
>> Change since 7:
>>
>> * API clarification with Linus and several architecture folks regarding
>> writel()
>>
>> https://www.mail-archive.com/netdev at vger.kernel.org/msg225806.html
>>
>> * removed wmb() in front of writel() at several places.
>> * remove old IA64 comments regarding ordering.
>>
>
> What do you think about this version? Did I miss any SMP barriers?

I would say we should probably just drop the whole set and start over.
If we don't need the wmb() we are going to need to go through and
clean up all of these paths and consider the barriers when updating
the layout of the code.

For example I have been thinking about it and in the case of x86 we
are probably better off not bothering with the wmb() and
writel_relaxed() and instead switch over to the smp_wmb() and writel()
since in the case of a strongly ordered system like x86 or sparc this
ends up translating out to a couple of compile barriers.

I will also need time to reevaluate the Rx paths since dropping the
wmb() may have other effects which I need to verify.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-03 17:47     ` Alexander Duyck
  0 siblings, 0 replies; 40+ messages in thread
From: Alexander Duyck @ 2018-04-03 17:47 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Alex,
>
> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>> Code includes wmb() followed by writel() in multiple places. writel()
>> already has a barrier on some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>>
>> I did a regex search for wmb() followed by writel() in each drivers
>> directory. I scrubbed the ones I care about in this series.
>>
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
>>
>> We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
>>
>> We start with mostly used ones and hope to increase the coverage over time.
>> It will take a while to cover all drivers.
>>
>> Feel free to apply patches individually.
>>
>> Change since 7:
>>
>> * API clarification with Linus and several architecture folks regarding
>> writel()
>>
>> https://www.mail-archive.com/netdev at vger.kernel.org/msg225806.html
>>
>> * removed wmb() in front of writel() at several places.
>> * remove old IA64 comments regarding ordering.
>>
>
> What do you think about this version? Did I miss any SMP barriers?

I would say we should probably just drop the whole set and start over.
If we don't need the wmb() we are going to need to go through and
clean up all of these paths and consider the barriers when updating
the layout of the code.

For example I have been thinking about it and in the case of x86 we
are probably better off not bothering with the wmb() and
writel_relaxed() and instead switch over to the smp_wmb() and writel()
since in the case of a strongly ordered system like x86 or sparc this
ends up translating out to a couple of compile barriers.

I will also need time to reevaluate the Rx paths since dropping the
wmb() may have other effects which I need to verify.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-04-03 17:47     ` Alexander Duyck
  (?)
@ 2018-04-03 17:50       ` Sinan Kaya
  -1 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-03 17:50 UTC (permalink / raw)
  To: Alexander Duyck, intel-wired-lan
  Cc: Jeff Kirsher, Netdev, Timur Tabi, sulrich, linux-arm-msm,
	linux-arm-kernel

On 4/3/2018 1:47 PM, Alexander Duyck wrote:
> On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> Alex,
>>
>> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>>> Code includes wmb() followed by writel() in multiple places. writel()
>>> already has a barrier on some architectures like arm64.
>>>
>>> This ends up CPU observing two barriers back to back before executing the
>>> register write.
>>>
>>> Since code already has an explicit barrier call, changing writel() to
>>> writel_relaxed().
>>>
>>> I did a regex search for wmb() followed by writel() in each drivers
>>> directory. I scrubbed the ones I care about in this series.
>>>
>>> I considered "ease of change", "popular usage" and "performance critical
>>> path" as the determining criteria for my filtering.
>>>
>>> We used relaxed API heavily on ARM for a long time but
>>> it did not exist on other architectures. For this reason, relaxed
>>> architectures have been paying double penalty in order to use the common
>>> drivers.
>>>
>>> Now that relaxed API is present on all architectures, we can go and scrub
>>> all drivers to see what needs to change and what can remain.
>>>
>>> We start with mostly used ones and hope to increase the coverage over time.
>>> It will take a while to cover all drivers.
>>>
>>> Feel free to apply patches individually.
>>>
>>> Change since 7:
>>>
>>> * API clarification with Linus and several architecture folks regarding
>>> writel()
>>>
>>> https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html
>>>
>>> * removed wmb() in front of writel() at several places.
>>> * remove old IA64 comments regarding ordering.
>>>
>>
>> What do you think about this version? Did I miss any SMP barriers?
> 
> I would say we should probably just drop the whole set and start over.
> If we don't need the wmb() we are going to need to go through and
> clean up all of these paths and consider the barriers when updating
> the layout of the code.
> 
> For example I have been thinking about it and in the case of x86 we
> are probably better off not bothering with the wmb() and
> writel_relaxed() and instead switch over to the smp_wmb() and writel()
> since in the case of a strongly ordered system like x86 or sparc this
> ends up translating out to a couple of compile barriers.
> 
> I will also need time to reevaluate the Rx paths since dropping the
> wmb() may have other effects which I need to verify.

Sounds good, I'll let you work on it.

@Jeff Kirsher: can you drop this version from your development branch until
Alex posts the next version?

> 
> Thanks.
> 
> - Alex
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-03 17:50       ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-03 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/3/2018 1:47 PM, Alexander Duyck wrote:
> On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> Alex,
>>
>> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>>> Code includes wmb() followed by writel() in multiple places. writel()
>>> already has a barrier on some architectures like arm64.
>>>
>>> This ends up CPU observing two barriers back to back before executing the
>>> register write.
>>>
>>> Since code already has an explicit barrier call, changing writel() to
>>> writel_relaxed().
>>>
>>> I did a regex search for wmb() followed by writel() in each drivers
>>> directory. I scrubbed the ones I care about in this series.
>>>
>>> I considered "ease of change", "popular usage" and "performance critical
>>> path" as the determining criteria for my filtering.
>>>
>>> We used relaxed API heavily on ARM for a long time but
>>> it did not exist on other architectures. For this reason, relaxed
>>> architectures have been paying double penalty in order to use the common
>>> drivers.
>>>
>>> Now that relaxed API is present on all architectures, we can go and scrub
>>> all drivers to see what needs to change and what can remain.
>>>
>>> We start with mostly used ones and hope to increase the coverage over time.
>>> It will take a while to cover all drivers.
>>>
>>> Feel free to apply patches individually.
>>>
>>> Change since 7:
>>>
>>> * API clarification with Linus and several architecture folks regarding
>>> writel()
>>>
>>> https://www.mail-archive.com/netdev at vger.kernel.org/msg225806.html
>>>
>>> * removed wmb() in front of writel() at several places.
>>> * remove old IA64 comments regarding ordering.
>>>
>>
>> What do you think about this version? Did I miss any SMP barriers?
> 
> I would say we should probably just drop the whole set and start over.
> If we don't need the wmb() we are going to need to go through and
> clean up all of these paths and consider the barriers when updating
> the layout of the code.
> 
> For example I have been thinking about it and in the case of x86 we
> are probably better off not bothering with the wmb() and
> writel_relaxed() and instead switch over to the smp_wmb() and writel()
> since in the case of a strongly ordered system like x86 or sparc this
> ends up translating out to a couple of compile barriers.
> 
> I will also need time to reevaluate the Rx paths since dropping the
> wmb() may have other effects which I need to verify.

Sounds good, I'll let you work on it.

@Jeff Kirsher: can you drop this version from your development branch until
Alex posts the next version?

> 
> Thanks.
> 
> - Alex
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-03 17:50       ` Sinan Kaya
  0 siblings, 0 replies; 40+ messages in thread
From: Sinan Kaya @ 2018-04-03 17:50 UTC (permalink / raw)
  To: intel-wired-lan

On 4/3/2018 1:47 PM, Alexander Duyck wrote:
> On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> Alex,
>>
>> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>>> Code includes wmb() followed by writel() in multiple places. writel()
>>> already has a barrier on some architectures like arm64.
>>>
>>> This ends up CPU observing two barriers back to back before executing the
>>> register write.
>>>
>>> Since code already has an explicit barrier call, changing writel() to
>>> writel_relaxed().
>>>
>>> I did a regex search for wmb() followed by writel() in each drivers
>>> directory. I scrubbed the ones I care about in this series.
>>>
>>> I considered "ease of change", "popular usage" and "performance critical
>>> path" as the determining criteria for my filtering.
>>>
>>> We used relaxed API heavily on ARM for a long time but
>>> it did not exist on other architectures. For this reason, relaxed
>>> architectures have been paying double penalty in order to use the common
>>> drivers.
>>>
>>> Now that relaxed API is present on all architectures, we can go and scrub
>>> all drivers to see what needs to change and what can remain.
>>>
>>> We start with mostly used ones and hope to increase the coverage over time.
>>> It will take a while to cover all drivers.
>>>
>>> Feel free to apply patches individually.
>>>
>>> Change since 7:
>>>
>>> * API clarification with Linus and several architecture folks regarding
>>> writel()
>>>
>>> https://www.mail-archive.com/netdev at vger.kernel.org/msg225806.html
>>>
>>> * removed wmb() in front of writel() at several places.
>>> * remove old IA64 comments regarding ordering.
>>>
>>
>> What do you think about this version? Did I miss any SMP barriers?
> 
> I would say we should probably just drop the whole set and start over.
> If we don't need the wmb() we are going to need to go through and
> clean up all of these paths and consider the barriers when updating
> the layout of the code.
> 
> For example I have been thinking about it and in the case of x86 we
> are probably better off not bothering with the wmb() and
> writel_relaxed() and instead switch over to the smp_wmb() and writel()
> since in the case of a strongly ordered system like x86 or sparc this
> ends up translating out to a couple of compile barriers.
> 
> I will also need time to reevaluate the Rx paths since dropping the
> wmb() may have other effects which I need to verify.

Sounds good, I'll let you work on it.

@Jeff Kirsher: can you drop this version from your development branch until
Alex posts the next version?

> 
> Thanks.
> 
> - Alex
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-04-03 17:50       ` Sinan Kaya
  (?)
@ 2018-04-03 18:26         ` Jeff Kirsher
  -1 siblings, 0 replies; 40+ messages in thread
From: Jeff Kirsher @ 2018-04-03 18:26 UTC (permalink / raw)
  To: Sinan Kaya, Alexander Duyck, intel-wired-lan
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1148 bytes --]

On Tue, 2018-04-03 at 13:50 -0400, Sinan Kaya wrote:
> > > What do you think about this version? Did I miss any SMP
> > > barriers?
> > 
> > I would say we should probably just drop the whole set and start
> > over.
> > If we don't need the wmb() we are going to need to go through and
> > clean up all of these paths and consider the barriers when updating
> > the layout of the code.
> > 
> > For example I have been thinking about it and in the case of x86 we
> > are probably better off not bothering with the wmb() and
> > writel_relaxed() and instead switch over to the smp_wmb() and
> > writel()
> > since in the case of a strongly ordered system like x86 or sparc
> > this
> > ends up translating out to a couple of compile barriers.
> > 
> > I will also need time to reevaluate the Rx paths since dropping the
> > wmb() may have other effects which I need to verify.
> 
> Sounds good, I'll let you work on it.
> 
> @Jeff Kirsher: can you drop this version from your development branch
> until
> Alex posts the next version?

Already on it, I will work with Alex on any possible future versions of
these patches.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-03 18:26         ` Jeff Kirsher
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff Kirsher @ 2018-04-03 18:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2018-04-03 at 13:50 -0400, Sinan Kaya wrote:
> > > What do you think about this version? Did I miss any SMP
> > > barriers?
> > 
> > I would say we should probably just drop the whole set and start
> > over.
> > If we don't need the wmb() we are going to need to go through and
> > clean up all of these paths and consider the barriers when updating
> > the layout of the code.
> > 
> > For example I have been thinking about it and in the case of x86 we
> > are probably better off not bothering with the wmb() and
> > writel_relaxed() and instead switch over to the smp_wmb() and
> > writel()
> > since in the case of a strongly ordered system like x86 or sparc
> > this
> > ends up translating out to a couple of compile barriers.
> > 
> > I will also need time to reevaluate the Rx paths since dropping the
> > wmb() may have other effects which I need to verify.
> 
> Sounds good, I'll let you work on it.
> 
> @Jeff Kirsher: can you drop this version from your development branch
> until
> Alex posts the next version?

Already on it, I will work with Alex on any possible future versions of
these patches.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180403/df3259a9/attachment.sig>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-04-03 18:26         ` Jeff Kirsher
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff Kirsher @ 2018-04-03 18:26 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, 2018-04-03 at 13:50 -0400, Sinan Kaya wrote:
> > > What do you think about this version? Did I miss any SMP
> > > barriers?
> > 
> > I would say we should probably just drop the whole set and start
> > over.
> > If we don't need the wmb() we are going to need to go through and
> > clean up all of these paths and consider the barriers when updating
> > the layout of the code.
> > 
> > For example I have been thinking about it and in the case of x86 we
> > are probably better off not bothering with the wmb() and
> > writel_relaxed() and instead switch over to the smp_wmb() and
> > writel()
> > since in the case of a strongly ordered system like x86 or sparc
> > this
> > ends up translating out to a couple of compile barriers.
> > 
> > I will also need time to reevaluate the Rx paths since dropping the
> > wmb() may have other effects which I need to verify.
> 
> Sounds good, I'll let you work on it.
> 
> @Jeff Kirsher: can you drop this version from your development branch
> until
> Alex posts the next version?

Already on it, I will work with Alex on any possible future versions of
these patches.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20180403/df3259a9/attachment-0001.asc>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 6/7] ixgbevf: keep writel() closer to wmb()
  2018-04-02 19:06   ` Sinan Kaya
  (?)
  (?)
@ 2018-04-03 18:51   ` Bowers, AndrewX
  -1 siblings, 0 replies; 40+ messages in thread
From: Bowers, AndrewX @ 2018-04-03 18:51 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Sinan Kaya
> Sent: Monday, April 2, 2018 12:06 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: sulrich at codeaurora.org; netdev at vger.kernel.org;
> timur at codeaurora.org; linux-kernel at vger.kernel.org; Sinan Kaya
> <okaya@codeaurora.org>; intel-wired-lan at lists.osuosl.org; linux-arm-
> msm at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: [Intel-wired-lan] [PATCH v8 6/7] ixgbevf: keep writel() closer to
> wmb()
> 
> Remove ixgbevf_write_tail() in favor of moving writel() close to wmb().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 -----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
>  2 files changed, 3 insertions(+), 8 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06   ` Sinan Kaya
  (?)
  (?)
@ 2018-04-03 18:51   ` Bowers, AndrewX
  -1 siblings, 0 replies; 40+ messages in thread
From: Bowers, AndrewX @ 2018-04-03 18:51 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Sinan Kaya
> Sent: Monday, April 2, 2018 12:07 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: sulrich at codeaurora.org; netdev at vger.kernel.org;
> timur at codeaurora.org; linux-kernel at vger.kernel.org; Sinan Kaya
> <okaya@codeaurora.org>; intel-wired-lan at lists.osuosl.org; linux-arm-
> msm at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: [Intel-wired-lan] [PATCH v8 7/7] ixgbevf: eliminate duplicate barriers
> on weakly-ordered archs
> 
> memory-barriers.txt has been updated as follows:
> 
> "When using writel(), a prior wmb() is not needed to guarantee that the
> cache coherent memory writes have completed before writing to the MMIO
> region."
> 
> Remove old IA-64 comments in the code along with unneeded wmb() in
> front of writel().
> 
> There are places in the code where wmb() has been used as a double barrier
> for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
> such places, keep the wmb() but replace the following writel() with
> writel_relaxed() to have a sequence as
> 
> wmb()
> writel_relaxed()
> mmio_wb()
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 +++++++-------------
> ---
>  1 file changed, 7 insertions(+), 16 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06   ` Sinan Kaya
  (?)
  (?)
@ 2018-04-03 18:52   ` Bowers, AndrewX
  -1 siblings, 0 replies; 40+ messages in thread
From: Bowers, AndrewX @ 2018-04-03 18:52 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Sinan Kaya
> Sent: Monday, April 2, 2018 12:06 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: sulrich at codeaurora.org; netdev at vger.kernel.org;
> timur at codeaurora.org; linux-kernel at vger.kernel.org; Sinan Kaya
> <okaya@codeaurora.org>; intel-wired-lan at lists.osuosl.org; linux-arm-
> msm at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: [Intel-wired-lan] [PATCH v8 1/7] i40e/i40evf: Eliminate duplicate
> barriers on weakly-ordered archs
> 
> memory-barriers.txt has been updated as follows:
> 
> "When using writel(), a prior wmb() is not needed to guarantee that the
> cache coherent memory writes have completed before writing to the MMIO
> region."
> 
> Remove old IA-64 comments in the code along with unneeded wmb() in
> front of writel().
> 
> There are places in the code where wmb() has been used as a double barrier
> for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
> such places, keep the wmb() but replace the following writel() with
> writel_relaxed() to have a sequence as
> 
> wmb()
> writel_relaxed()
> mmio_wb()
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 +++++++++-------------
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  8 +-------
>  2 files changed, 10 insertions(+), 20 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Intel-wired-lan] [PATCH v8 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
  2018-04-02 19:06   ` Sinan Kaya
                     ` (2 preceding siblings ...)
  (?)
@ 2018-04-03 18:52   ` Bowers, AndrewX
  -1 siblings, 0 replies; 40+ messages in thread
From: Bowers, AndrewX @ 2018-04-03 18:52 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Sinan Kaya
> Sent: Monday, April 2, 2018 12:06 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: sulrich at codeaurora.org; netdev at vger.kernel.org;
> timur at codeaurora.org; linux-kernel at vger.kernel.org; Sinan Kaya
> <okaya@codeaurora.org>; intel-wired-lan at lists.osuosl.org; linux-arm-
> msm at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: [Intel-wired-lan] [PATCH v8 2/7] ixgbe: eliminate duplicate barriers
> on weakly-ordered archs
> 
> memory-barriers.txt has been updated as follows:
> 
> "When using writel(), a prior wmb() is not needed to guarantee that the
> cache coherent memory writes have completed before writing to the MMIO
> region."
> 
> Remove old IA-64 comments in the code along with unneeded wmb() in
> front of writel().
> 
> There are places in the code where wmb() has been used as a double barrier
> for CPU and IO in place of smp_wmb() and wmb() as an optimization. For
> such places, keep the wmb() but replace the following writel() with
> writel_relaxed() to have a sequence as
> 
> wmb()
> writel_relaxed()
> mmio_wb()
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 ++---------------------
>  1 file changed, 2 insertions(+), 21 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-04-03 18:52 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-02 19:06 [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
2018-04-02 19:06 ` Sinan Kaya
2018-04-02 19:06 ` [PATCH v8 1/7] i40e/i40evf: " Sinan Kaya
2018-04-02 19:06   ` [Intel-wired-lan] " Sinan Kaya
2018-04-02 19:06   ` Sinan Kaya
2018-04-03 18:52   ` [Intel-wired-lan] " Bowers, AndrewX
2018-04-02 19:06 ` [PATCH v8 2/7] ixgbe: eliminate " Sinan Kaya
2018-04-02 19:06   ` [Intel-wired-lan] " Sinan Kaya
2018-04-02 19:06   ` Sinan Kaya
2018-04-02 19:06   ` Sinan Kaya
2018-04-03 18:52   ` [Intel-wired-lan] " Bowers, AndrewX
2018-04-02 19:06 ` [PATCH v8 3/7] igbvf: " Sinan Kaya
2018-04-02 19:06   ` [Intel-wired-lan] " Sinan Kaya
2018-04-02 19:06   ` Sinan Kaya
2018-04-02 19:06 ` [PATCH v8 4/7] igb: " Sinan Kaya
2018-04-02 19:06   ` [Intel-wired-lan] " Sinan Kaya
2018-04-02 19:06   ` Sinan Kaya
2018-04-02 19:06   ` Sinan Kaya
2018-04-02 19:06 ` [PATCH v8 5/7] fm10k: Eliminate " Sinan Kaya
2018-04-02 19:06   ` [Intel-wired-lan] " Sinan Kaya
2018-04-02 19:06   ` Sinan Kaya
2018-04-02 19:06 ` [PATCH v8 6/7] ixgbevf: keep writel() closer to wmb() Sinan Kaya
2018-04-02 19:06   ` [Intel-wired-lan] " Sinan Kaya
2018-04-02 19:06   ` Sinan Kaya
2018-04-03 18:51   ` [Intel-wired-lan] " Bowers, AndrewX
2018-04-02 19:06 ` [PATCH v8 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
2018-04-02 19:06   ` [Intel-wired-lan] " Sinan Kaya
2018-04-02 19:06   ` Sinan Kaya
2018-04-03 18:51   ` [Intel-wired-lan] " Bowers, AndrewX
2018-04-03  2:59 ` [PATCH v8 0/7] netdev: intel: Eliminate " Sinan Kaya
2018-04-03  2:59   ` Sinan Kaya
2018-04-03 17:47   ` Alexander Duyck
2018-04-03 17:47     ` [Intel-wired-lan] " Alexander Duyck
2018-04-03 17:47     ` Alexander Duyck
2018-04-03 17:50     ` Sinan Kaya
2018-04-03 17:50       ` [Intel-wired-lan] " Sinan Kaya
2018-04-03 17:50       ` Sinan Kaya
2018-04-03 18:26       ` Jeff Kirsher
2018-04-03 18:26         ` [Intel-wired-lan] " Jeff Kirsher
2018-04-03 18:26         ` Jeff Kirsher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.