[PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52 ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: sulrich, netdev, timur, Sinan Kaya, linux-arm-msm, linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

I did a regex search for wmb() followed by writel() in each drivers
directory.
I scrubbed the ones I care about in this series.

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.

We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

Feel free to apply patches individually.

Changes since v6:
clean up between 2..6 and then make your Alex's changes on 1 and 7
    The mmiowb shouldn't be needed for Rx. Only one CPU will be running
    NAPI for the queue and we will synchronize this with a full writel
    anyway when we re-enable the interrupts.

Sinan Kaya (7):
  i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  ixgbe: eliminate duplicate barriers on weakly-ordered archs
  igbvf: eliminate duplicate barriers on weakly-ordered archs
  igb: eliminate duplicate barriers on weakly-ordered archs
  fm10k: Eliminate duplicate barriers on weakly-ordered archs
  ixgbevf: keep writel() closer to wmb()
  ixgbevf: eliminate duplicate barriers on weakly-ordered archs

 drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
 drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
 8 files changed, 30 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52 ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

I did a regex search for wmb() followed by writel() in each drivers
directory.
I scrubbed the ones I care about in this series.

I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.

We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.

Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.

We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.

Feel free to apply patches individually.

Changes since v6:
clean up between 2..6 and then make your Alex's changes on 1 and 7
    The mmiowb shouldn't be needed for Rx. Only one CPU will be running
    NAPI for the queue and we will synchronize this with a full writel
    anyway when we re-enable the interrupts.

Sinan Kaya (7):
  i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  ixgbe: eliminate duplicate barriers on weakly-ordered archs
  igbvf: eliminate duplicate barriers on weakly-ordered archs
  igb: eliminate duplicate barriers on weakly-ordered archs
  fm10k: Eliminate duplicate barriers on weakly-ordered archs
  ixgbevf: keep writel() closer to wmb()
  ixgbevf: eliminate duplicate barriers on weakly-ordered archs

 drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
 drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
 8 files changed, 30 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 18:52 ` Sinan Kaya
  (?)
@ 2018-03-23 18:52   ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 14 ++++++++++----
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  4 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index c6972bd..d970350 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -186,7 +186,13 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter *fdir_data,
 	/* Mark the data descriptor to be watched */
 	first->next_to_watch = tx_desc;
 
-	writel(tx_ring->next_to_use, tx_ring->tail);
+	writel_relaxed(tx_ring->next_to_use, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
+
 	return 0;
 
 dma_fail:
@@ -1529,7 +1535,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
@@ -2412,7 +2418,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		 */
 		wmb();
 
-		writel(xdp_ring->next_to_use, xdp_ring->tail);
+		writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	rx_ring->skb = skb;
@@ -3437,7 +3443,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 1ae112f..5db5c50c 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -810,7 +810,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
@@ -2379,7 +2379,7 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 14 ++++++++++----
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  4 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index c6972bd..d970350 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -186,7 +186,13 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter *fdir_data,
 	/* Mark the data descriptor to be watched */
 	first->next_to_watch = tx_desc;
 
-	writel(tx_ring->next_to_use, tx_ring->tail);
+	writel_relaxed(tx_ring->next_to_use, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
+
 	return 0;
 
 dma_fail:
@@ -1529,7 +1535,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
@@ -2412,7 +2418,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		 */
 		wmb();
 
-		writel(xdp_ring->next_to_use, xdp_ring->tail);
+		writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	rx_ring->skb = skb;
@@ -3437,7 +3443,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 1ae112f..5db5c50c 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -810,7 +810,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
@@ -2379,7 +2379,7 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 14 ++++++++++----
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  4 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index c6972bd..d970350 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -186,7 +186,13 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter *fdir_data,
 	/* Mark the data descriptor to be watched */
 	first->next_to_watch = tx_desc;
 
-	writel(tx_ring->next_to_use, tx_ring->tail);
+	writel_relaxed(tx_ring->next_to_use, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
+
 	return 0;
 
 dma_fail:
@@ -1529,7 +1535,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
@@ -2412,7 +2418,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		 */
 		wmb();
 
-		writel(xdp_ring->next_to_use, xdp_ring->tail);
+		writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	rx_ring->skb = skb;
@@ -3437,7 +3443,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 1ae112f..5db5c50c 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -810,7 +810,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 	 * such as IA-64).
 	 */
 	wmb();
-	writel(val, rx_ring->tail);
+	writel_relaxed(val, rx_ring->tail);
 }
 
 /**
@@ -2379,7 +2379,7 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 18:52 ` Sinan Kaya
  (?)
  (?)
@ 2018-03-23 18:52   ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: sulrich, netdev, timur, linux-kernel, Sinan Kaya,
	intel-wired-lan, linux-arm-msm, linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e3b32ea..fb80edb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1701,7 +1701,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -2470,7 +2470,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(ring->next_to_use, ring->tail);
+		writel_relaxed(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
 	}
@@ -8101,7 +8101,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -10038,7 +10038,7 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	 * are new descriptors to fetch.
 	 */
 	wmb();
-	writel(ring->next_to_use, ring->tail);
+	writel_relaxed(ring->next_to_use, ring->tail);
 
 	return;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e3b32ea..fb80edb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1701,7 +1701,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -2470,7 +2470,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(ring->next_to_use, ring->tail);
+		writel_relaxed(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
 	}
@@ -8101,7 +8101,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -10038,7 +10038,7 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	 * are new descriptors to fetch.
 	 */
 	wmb();
-	writel(ring->next_to_use, ring->tail);
+	writel_relaxed(ring->next_to_use, ring->tail);
 
 	return;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e3b32ea..fb80edb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1701,7 +1701,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -2470,7 +2470,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(ring->next_to_use, ring->tail);
+		writel_relaxed(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
 	}
@@ -8101,7 +8101,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -10038,7 +10038,7 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	 * are new descriptors to fetch.
 	 */
 	wmb();
-	writel(ring->next_to_use, ring->tail);
+	writel_relaxed(ring->next_to_use, ring->tail);
 
 	return;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e3b32ea..fb80edb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1701,7 +1701,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -2470,7 +2470,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(ring->next_to_use, ring->tail);
+		writel_relaxed(ring->next_to_use, ring->tail);
 
 		xdp_do_flush_map();
 	}
@@ -8101,7 +8101,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -10038,7 +10038,7 @@ static void ixgbe_xdp_flush(struct net_device *dev)
 	 * are new descriptors to fetch.
 	 */
 	wmb();
-	writel(ring->next_to_use, ring->tail);
+	writel_relaxed(ring->next_to_use, ring->tail);
 
 	return;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 3/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 18:52 ` Sinan Kaya
  (?)
  (?)
@ 2018-03-23 18:52   ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: sulrich, netdev, timur, linux-kernel, Sinan Kaya,
	intel-wired-lan, linux-arm-msm, linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index fa07876..df2283b 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -252,7 +252,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		 * such as IA-64).
 		*/
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->tail);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
 
@@ -2298,7 +2298,7 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
 
 	tx_ring->buffer_info[first].next_to_watch = tx_desc;
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tail);
+	writel_relaxed(i, adapter->hw.hw_addr + tx_ring->tail);
 	/* we need this if more than one processor can write to our tail
 	 * at a time, it synchronizes IO on IA64/Altix systems
 	 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 3/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index fa07876..df2283b 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -252,7 +252,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		 * such as IA-64).
 		*/
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->tail);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
 
@@ -2298,7 +2298,7 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
 
 	tx_ring->buffer_info[first].next_to_watch = tx_desc;
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tail);
+	writel_relaxed(i, adapter->hw.hw_addr + tx_ring->tail);
 	/* we need this if more than one processor can write to our tail
 	 * at a time, it synchronizes IO on IA64/Altix systems
 	 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 3/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index fa07876..df2283b 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -252,7 +252,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		 * such as IA-64).
 		*/
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->tail);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
 
@@ -2298,7 +2298,7 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
 
 	tx_ring->buffer_info[first].next_to_watch = tx_desc;
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tail);
+	writel_relaxed(i, adapter->hw.hw_addr + tx_ring->tail);
 	/* we need this if more than one processor can write to our tail
 	 * at a time, it synchronizes IO on IA64/Altix systems
 	 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 3/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igbvf/netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index fa07876..df2283b 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -252,7 +252,7 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
 		 * such as IA-64).
 		*/
 		wmb();
-		writel(i, adapter->hw.hw_addr + rx_ring->tail);
+		writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
 	}
 }
 
@@ -2298,7 +2298,7 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
 
 	tx_ring->buffer_info[first].next_to_watch = tx_desc;
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tail);
+	writel_relaxed(i, adapter->hw.hw_addr + tx_ring->tail);
 	/* we need this if more than one processor can write to our tail
 	 * at a time, it synchronizes IO on IA64/Altix systems
 	 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 4/7] igb: eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 18:52 ` Sinan Kaya
  (?)
@ 2018-03-23 18:52   ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 3d4ff3c..c501378 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5674,7 +5674,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -8079,7 +8079,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 4/7] igb: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 3d4ff3c..c501378 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5674,7 +5674,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -8079,7 +8079,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 4/7] igb: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 3d4ff3c..c501378 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5674,7 +5674,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
@@ -8079,7 +8079,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 5/7] fm10k: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 18:52 ` Sinan Kaya
  (?)
@ 2018-03-23 18:52   ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 409554d..bb94f04 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -180,7 +180,7 @@ void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count)
 		wmb();
 
 		/* notify hardware of new descriptors */
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -1055,7 +1055,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 5/7] fm10k: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 409554d..bb94f04 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -180,7 +180,7 @@ void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count)
 		wmb();
 
 		/* notify hardware of new descriptors */
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -1055,7 +1055,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 5/7] fm10k: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing
the register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 409554d..bb94f04 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -180,7 +180,7 @@ void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count)
 		wmb();
 
 		/* notify hardware of new descriptors */
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -1055,7 +1055,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
 
 	/* notify HW of packet */
 	if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
-		writel(i, tx_ring->tail);
+		writel_relaxed(i, tx_ring->tail);
 
 		/* we need this if more than one processor can write to our tail
 		 * at a time, it synchronizes IO on IA64/Altix systems
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 6/7] ixgbevf: keep writel() closer to wmb()
  2018-03-23 18:52 ` Sinan Kaya
  (?)
@ 2018-03-23 18:52   ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index f712646..f97091d 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -312,11 +312,6 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
-static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
-{
-	writel(value, ring->tail);
-}
-
 #define IXGBEVF_RX_DESC(R, i)	\
 	(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
 #define IXGBEVF_TX_DESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 3d9033f..815cb1a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		 * such as IA-64).
 		 */
 		wmb();
-		ixgbevf_write_tail(rx_ring, i);
+		writel(i, rx_ring->tail);
 	}
 }
 
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		ixgbevf_write_tail(xdp_ring, xdp_ring->next_to_use);
+		writel(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	ixgbevf_write_tail(tx_ring, i);
+	writel(i, tx_ring->tail);
 
 	return;
 dma_error:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 6/7] ixgbevf: keep writel() closer to wmb()
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index f712646..f97091d 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -312,11 +312,6 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
-static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
-{
-	writel(value, ring->tail);
-}
-
 #define IXGBEVF_RX_DESC(R, i)	\
 	(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
 #define IXGBEVF_TX_DESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 3d9033f..815cb1a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		 * such as IA-64).
 		 */
 		wmb();
-		ixgbevf_write_tail(rx_ring, i);
+		writel(i, rx_ring->tail);
 	}
 }
 
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		ixgbevf_write_tail(xdp_ring, xdp_ring->next_to_use);
+		writel(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	ixgbevf_write_tail(tx_ring, i);
+	writel(i, tx_ring->tail);
 
 	return;
 dma_error:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 6/7] ixgbevf: keep writel() closer to wmb()
@ 2018-03-23 18:52   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:52 UTC (permalink / raw)
  To: intel-wired-lan

Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 5 -----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index f712646..f97091d 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -312,11 +312,6 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
 	return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
 }
 
-static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
-{
-	writel(value, ring->tail);
-}
-
 #define IXGBEVF_RX_DESC(R, i)	\
 	(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
 #define IXGBEVF_TX_DESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 3d9033f..815cb1a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		 * such as IA-64).
 		 */
 		wmb();
-		ixgbevf_write_tail(rx_ring, i);
+		writel(i, rx_ring->tail);
 	}
 }
 
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		ixgbevf_write_tail(xdp_ring, xdp_ring->next_to_use);
+		writel(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	ixgbevf_write_tail(tx_ring, i);
+	writel(i, tx_ring->tail);
 
 	return;
 dma_error:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 18:52 ` Sinan Kaya
  (?)
@ 2018-03-23 18:53   ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:53 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
	Sinan Kaya, intel-wired-lan, linux-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 815cb1a..7277647 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(xdp_ring->next_to_use, xdp_ring->tail);
+		writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,12 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	writel(i, tx_ring->tail);
+	writel_relaxed(i, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
 
 	return;
 dma_error:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:53   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:53 UTC (permalink / raw)
  To: linux-arm-kernel

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 815cb1a..7277647 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(xdp_ring->next_to_use, xdp_ring->tail);
+		writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,12 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	writel(i, tx_ring->tail);
+	writel_relaxed(i, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
 
 	return;
 dma_error:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 18:53   ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-23 18:53 UTC (permalink / raw)
  To: intel-wired-lan

Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 815cb1a..7277647 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
 		 * such as IA-64).
 		 */
 		wmb();
-		writel(i, rx_ring->tail);
+		writel_relaxed(i, rx_ring->tail);
 	}
 }
 
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * know there are new descriptors to fetch.
 		 */
 		wmb();
-		writel(xdp_ring->next_to_use, xdp_ring->tail);
+		writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,12 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	tx_ring->next_to_use = i;
 
 	/* notify HW of packet */
-	writel(i, tx_ring->tail);
+	writel_relaxed(i, tx_ring->tail);
+
+	/* We need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
 
 	return;
 dma_error:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 18:52 ` Sinan Kaya
  (?)
@ 2018-03-23 21:53   ` Alexander Duyck
  -1 siblings, 0 replies; 58+ messages in thread
From: Alexander Duyck @ 2018-03-23 21:53 UTC (permalink / raw)
  To: Sinan Kaya, intel-wired-lan
  Cc: sulrich, Netdev, Timur Tabi, Jeff Kirsher, linux-arm-msm,
	linux-arm-kernel

On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> I did a regex search for wmb() followed by writel() in each drivers
> directory.
> I scrubbed the ones I care about in this series.
>
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
>
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
>
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
>
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
>
> Feel free to apply patches individually.

I looked over the set and they seem good.

Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>

>
> Changes since v6:
> clean up between 2..6 and then make your Alex's changes on 1 and 7
>     The mmiowb shouldn't be needed for Rx. Only one CPU will be running
>     NAPI for the queue and we will synchronize this with a full writel
>     anyway when we re-enable the interrupts.
>
> Sinan Kaya (7):
>   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>   igbvf: eliminate duplicate barriers on weakly-ordered archs
>   igb: eliminate duplicate barriers on weakly-ordered archs
>   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbevf: keep writel() closer to wmb()
>   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
>
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
>  drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
>  drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
>  8 files changed, 30 insertions(+), 24 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 21:53   ` Alexander Duyck
  0 siblings, 0 replies; 58+ messages in thread
From: Alexander Duyck @ 2018-03-23 21:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> I did a regex search for wmb() followed by writel() in each drivers
> directory.
> I scrubbed the ones I care about in this series.
>
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
>
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
>
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
>
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
>
> Feel free to apply patches individually.

I looked over the set and they seem good.

Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>

>
> Changes since v6:
> clean up between 2..6 and then make your Alex's changes on 1 and 7
>     The mmiowb shouldn't be needed for Rx. Only one CPU will be running
>     NAPI for the queue and we will synchronize this with a full writel
>     anyway when we re-enable the interrupts.
>
> Sinan Kaya (7):
>   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>   igbvf: eliminate duplicate barriers on weakly-ordered archs
>   igb: eliminate duplicate barriers on weakly-ordered archs
>   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbevf: keep writel() closer to wmb()
>   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
>
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
>  drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
>  drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
>  8 files changed, 30 insertions(+), 24 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 21:53   ` Alexander Duyck
  0 siblings, 0 replies; 58+ messages in thread
From: Alexander Duyck @ 2018-03-23 21:53 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> I did a regex search for wmb() followed by writel() in each drivers
> directory.
> I scrubbed the ones I care about in this series.
>
> I considered "ease of change", "popular usage" and "performance critical
> path" as the determining criteria for my filtering.
>
> We used relaxed API heavily on ARM for a long time but
> it did not exist on other architectures. For this reason, relaxed
> architectures have been paying double penalty in order to use the common
> drivers.
>
> Now that relaxed API is present on all architectures, we can go and scrub
> all drivers to see what needs to change and what can remain.
>
> We start with mostly used ones and hope to increase the coverage over time.
> It will take a while to cover all drivers.
>
> Feel free to apply patches individually.

I looked over the set and they seem good.

Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>

>
> Changes since v6:
> clean up between 2..6 and then make your Alex's changes on 1 and 7
>     The mmiowb shouldn't be needed for Rx. Only one CPU will be running
>     NAPI for the queue and we will synchronize this with a full writel
>     anyway when we re-enable the interrupts.
>
> Sinan Kaya (7):
>   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>   igbvf: eliminate duplicate barriers on weakly-ordered archs
>   igb: eliminate duplicate barriers on weakly-ordered archs
>   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>   ixgbevf: keep writel() closer to wmb()
>   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
>
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
>  drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
>  drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
>  8 files changed, 30 insertions(+), 24 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 21:53   ` Alexander Duyck
  (?)
@ 2018-03-23 23:58     ` Jeff Kirsher
  -1 siblings, 0 replies; 58+ messages in thread
From: Jeff Kirsher @ 2018-03-23 23:58 UTC (permalink / raw)
  To: Alexander Duyck, Sinan Kaya, intel-wired-lan
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-kernel, linux-arm-msm


[-- Attachment #1.1: Type: text/plain, Size: 3094 bytes --]

On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
> wrote:
> > Code includes wmb() followed by writel() in multiple places. writel()
> > already has a barrier on some architectures like arm64.
> > 
> > This ends up CPU observing two barriers back to back before executing
> > the
> > register write.
> > 
> > Since code already has an explicit barrier call, changing writel() to
> > writel_relaxed().
> > 
> > I did a regex search for wmb() followed by writel() in each drivers
> > directory.
> > I scrubbed the ones I care about in this series.
> > 
> > I considered "ease of change", "popular usage" and "performance
> > critical
> > path" as the determining criteria for my filtering.
> > 
> > We used relaxed API heavily on ARM for a long time but
> > it did not exist on other architectures. For this reason, relaxed
> > architectures have been paying double penalty in order to use the
> > common
> > drivers.
> > 
> > Now that relaxed API is present on all architectures, we can go and
> > scrub
> > all drivers to see what needs to change and what can remain.
> > 
> > We start with mostly used ones and hope to increase the coverage over
> > time.
> > It will take a while to cover all drivers.
> > 
> > Feel free to apply patches individually.
> 
> I looked over the set and they seem good.
> 
> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>

Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
branch).  I will deal with this series in a day or two, after I have dealt
with my driver pull requests.

> > 
> > Changes since v6:
> > clean up between 2..6 and then make your Alex's changes on 1 and 7
> >     The mmiowb shouldn't be needed for Rx. Only one CPU will be running
> >     NAPI for the queue and we will synchronize this with a full writel
> >     anyway when we re-enable the interrupts.
> > 
> > Sinan Kaya (7):
> >   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
> >   ixgbe: eliminate duplicate barriers on weakly-ordered archs
> >   igbvf: eliminate duplicate barriers on weakly-ordered archs
> >   igb: eliminate duplicate barriers on weakly-ordered archs
> >   fm10k: Eliminate duplicate barriers on weakly-ordered archs
> >   ixgbevf: keep writel() closer to wmb()
> >   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
> > 
> >  drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
> >  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
> >  drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
> >  drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
> >  8 files changed, 30 insertions(+), 24 deletions(-)
> > 
> > --
> > 2.7.4
> > 

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 23:58     ` Jeff Kirsher
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff Kirsher @ 2018-03-23 23:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
> wrote:
> > Code includes wmb() followed by writel() in multiple places. writel()
> > already has a barrier on some architectures like arm64.
> > 
> > This ends up CPU observing two barriers back to back before executing
> > the
> > register write.
> > 
> > Since code already has an explicit barrier call, changing writel() to
> > writel_relaxed().
> > 
> > I did a regex search for wmb() followed by writel() in each drivers
> > directory.
> > I scrubbed the ones I care about in this series.
> > 
> > I considered "ease of change", "popular usage" and "performance
> > critical
> > path" as the determining criteria for my filtering.
> > 
> > We used relaxed API heavily on ARM for a long time but
> > it did not exist on other architectures. For this reason, relaxed
> > architectures have been paying double penalty in order to use the
> > common
> > drivers.
> > 
> > Now that relaxed API is present on all architectures, we can go and
> > scrub
> > all drivers to see what needs to change and what can remain.
> > 
> > We start with mostly used ones and hope to increase the coverage over
> > time.
> > It will take a while to cover all drivers.
> > 
> > Feel free to apply patches individually.
> 
> I looked over the set and they seem good.
> 
> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>

Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
branch).  I will deal with this series in a day or two, after I have dealt
with my driver pull requests.

> > 
> > Changes since v6:
> > clean up between 2..6 and then make your Alex's changes on 1 and 7
> >     The mmiowb shouldn't be needed for Rx. Only one CPU will be running
> >     NAPI for the queue and we will synchronize this with a full writel
> >     anyway when we re-enable the interrupts.
> > 
> > Sinan Kaya (7):
> >   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
> >   ixgbe: eliminate duplicate barriers on weakly-ordered archs
> >   igbvf: eliminate duplicate barriers on weakly-ordered archs
> >   igb: eliminate duplicate barriers on weakly-ordered archs
> >   fm10k: Eliminate duplicate barriers on weakly-ordered archs
> >   ixgbevf: keep writel() closer to wmb()
> >   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
> > 
> >  drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
> >  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
> >  drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
> >  drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
> >  8 files changed, 30 insertions(+), 24 deletions(-)
> > 
> > --
> > 2.7.4
> > 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180323/777698b2/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-23 23:58     ` Jeff Kirsher
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff Kirsher @ 2018-03-23 23:58 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
> wrote:
> > Code includes wmb() followed by writel() in multiple places. writel()
> > already has a barrier on some architectures like arm64.
> > 
> > This ends up CPU observing two barriers back to back before executing
> > the
> > register write.
> > 
> > Since code already has an explicit barrier call, changing writel() to
> > writel_relaxed().
> > 
> > I did a regex search for wmb() followed by writel() in each drivers
> > directory.
> > I scrubbed the ones I care about in this series.
> > 
> > I considered "ease of change", "popular usage" and "performance
> > critical
> > path" as the determining criteria for my filtering.
> > 
> > We used relaxed API heavily on ARM for a long time but
> > it did not exist on other architectures. For this reason, relaxed
> > architectures have been paying double penalty in order to use the
> > common
> > drivers.
> > 
> > Now that relaxed API is present on all architectures, we can go and
> > scrub
> > all drivers to see what needs to change and what can remain.
> > 
> > We start with mostly used ones and hope to increase the coverage over
> > time.
> > It will take a while to cover all drivers.
> > 
> > Feel free to apply patches individually.
> 
> I looked over the set and they seem good.
> 
> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>

Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
branch).  I will deal with this series in a day or two, after I have dealt
with my driver pull requests.

> > 
> > Changes since v6:
> > clean up between 2..6 and then make your Alex's changes on 1 and 7
> >     The mmiowb shouldn't be needed for Rx. Only one CPU will be running
> >     NAPI for the queue and we will synchronize this with a full writel
> >     anyway when we re-enable the interrupts.
> > 
> > Sinan Kaya (7):
> >   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
> >   ixgbe: eliminate duplicate barriers on weakly-ordered archs
> >   igbvf: eliminate duplicate barriers on weakly-ordered archs
> >   igb: eliminate duplicate barriers on weakly-ordered archs
> >   fm10k: Eliminate duplicate barriers on weakly-ordered archs
> >   ixgbevf: keep writel() closer to wmb()
> >   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
> > 
> >  drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
> >  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
> >  drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
> >  drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
> >  8 files changed, 30 insertions(+), 24 deletions(-)
> > 
> > --
> > 2.7.4
> > 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20180323/777698b2/attachment.asc>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-23 23:58     ` Jeff Kirsher
  (?)
@ 2018-03-24  2:34       ` okaya at codeaurora.org
  -1 siblings, 0 replies; 58+ messages in thread
From: okaya @ 2018-03-24  2:34 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: sulrich, Netdev, Timur Tabi, Alexander Duyck, intel-wired-lan,
	linux-arm-msm, linux-arm-kernel

On 2018-03-23 19:58, Jeff Kirsher wrote:
> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
>> wrote:
>> > Code includes wmb() followed by writel() in multiple places. writel()
>> > already has a barrier on some architectures like arm64.
>> >
>> > This ends up CPU observing two barriers back to back before executing
>> > the
>> > register write.
>> >
>> > Since code already has an explicit barrier call, changing writel() to
>> > writel_relaxed().
>> >
>> > I did a regex search for wmb() followed by writel() in each drivers
>> > directory.
>> > I scrubbed the ones I care about in this series.
>> >
>> > I considered "ease of change", "popular usage" and "performance
>> > critical
>> > path" as the determining criteria for my filtering.
>> >
>> > We used relaxed API heavily on ARM for a long time but
>> > it did not exist on other architectures. For this reason, relaxed
>> > architectures have been paying double penalty in order to use the
>> > common
>> > drivers.
>> >
>> > Now that relaxed API is present on all architectures, we can go and
>> > scrub
>> > all drivers to see what needs to change and what can remain.
>> >
>> > We start with mostly used ones and hope to increase the coverage over
>> > time.
>> > It will take a while to cover all drivers.
>> >
>> > Feel free to apply patches individually.
>> 
>> I looked over the set and they seem good.
>> 
>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
> branch).  I will deal with this series in a day or two, after I have 
> dealt
> with my driver pull requests.

Sorry, you will have to replace the ones you took from me.
> 
>> >
>> > Changes since v6:
>> > clean up between 2..6 and then make your Alex's changes on 1 and 7
>> >     The mmiowb shouldn't be needed for Rx. Only one CPU will be running
>> >     NAPI for the queue and we will synchronize this with a full writel
>> >     anyway when we re-enable the interrupts.
>> >
>> > Sinan Kaya (7):
>> >   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>> >   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>> >   igbvf: eliminate duplicate barriers on weakly-ordered archs
>> >   igb: eliminate duplicate barriers on weakly-ordered archs
>> >   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>> >   ixgbevf: keep writel() closer to wmb()
>> >   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
>> >
>> >  drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
>> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
>> >  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
>> >  drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
>> >  drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
>> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
>> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
>> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
>> >  8 files changed, 30 insertions(+), 24 deletions(-)
>> >
>> > --
>> > 2.7.4
>> >

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-24  2:34       ` okaya at codeaurora.org
  0 siblings, 0 replies; 58+ messages in thread
From: okaya at codeaurora.org @ 2018-03-24  2:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 2018-03-23 19:58, Jeff Kirsher wrote:
> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
>> wrote:
>> > Code includes wmb() followed by writel() in multiple places. writel()
>> > already has a barrier on some architectures like arm64.
>> >
>> > This ends up CPU observing two barriers back to back before executing
>> > the
>> > register write.
>> >
>> > Since code already has an explicit barrier call, changing writel() to
>> > writel_relaxed().
>> >
>> > I did a regex search for wmb() followed by writel() in each drivers
>> > directory.
>> > I scrubbed the ones I care about in this series.
>> >
>> > I considered "ease of change", "popular usage" and "performance
>> > critical
>> > path" as the determining criteria for my filtering.
>> >
>> > We used relaxed API heavily on ARM for a long time but
>> > it did not exist on other architectures. For this reason, relaxed
>> > architectures have been paying double penalty in order to use the
>> > common
>> > drivers.
>> >
>> > Now that relaxed API is present on all architectures, we can go and
>> > scrub
>> > all drivers to see what needs to change and what can remain.
>> >
>> > We start with mostly used ones and hope to increase the coverage over
>> > time.
>> > It will take a while to cover all drivers.
>> >
>> > Feel free to apply patches individually.
>> 
>> I looked over the set and they seem good.
>> 
>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
> branch).  I will deal with this series in a day or two, after I have 
> dealt
> with my driver pull requests.

Sorry, you will have to replace the ones you took from me.
> 
>> >
>> > Changes since v6:
>> > clean up between 2..6 and then make your Alex's changes on 1 and 7
>> >     The mmiowb shouldn't be needed for Rx. Only one CPU will be running
>> >     NAPI for the queue and we will synchronize this with a full writel
>> >     anyway when we re-enable the interrupts.
>> >
>> > Sinan Kaya (7):
>> >   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>> >   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>> >   igbvf: eliminate duplicate barriers on weakly-ordered archs
>> >   igb: eliminate duplicate barriers on weakly-ordered archs
>> >   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>> >   ixgbevf: keep writel() closer to wmb()
>> >   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
>> >
>> >  drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
>> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
>> >  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
>> >  drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
>> >  drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
>> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
>> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
>> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
>> >  8 files changed, 30 insertions(+), 24 deletions(-)
>> >
>> > --
>> > 2.7.4
>> >

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-24  2:34       ` okaya at codeaurora.org
  0 siblings, 0 replies; 58+ messages in thread
From: okaya @ 2018-03-24  2:34 UTC (permalink / raw)
  To: intel-wired-lan

On 2018-03-23 19:58, Jeff Kirsher wrote:
> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
>> wrote:
>> > Code includes wmb() followed by writel() in multiple places. writel()
>> > already has a barrier on some architectures like arm64.
>> >
>> > This ends up CPU observing two barriers back to back before executing
>> > the
>> > register write.
>> >
>> > Since code already has an explicit barrier call, changing writel() to
>> > writel_relaxed().
>> >
>> > I did a regex search for wmb() followed by writel() in each drivers
>> > directory.
>> > I scrubbed the ones I care about in this series.
>> >
>> > I considered "ease of change", "popular usage" and "performance
>> > critical
>> > path" as the determining criteria for my filtering.
>> >
>> > We used relaxed API heavily on ARM for a long time but
>> > it did not exist on other architectures. For this reason, relaxed
>> > architectures have been paying double penalty in order to use the
>> > common
>> > drivers.
>> >
>> > Now that relaxed API is present on all architectures, we can go and
>> > scrub
>> > all drivers to see what needs to change and what can remain.
>> >
>> > We start with mostly used ones and hope to increase the coverage over
>> > time.
>> > It will take a while to cover all drivers.
>> >
>> > Feel free to apply patches individually.
>> 
>> I looked over the set and they seem good.
>> 
>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
> branch).  I will deal with this series in a day or two, after I have 
> dealt
> with my driver pull requests.

Sorry, you will have to replace the ones you took from me.
> 
>> >
>> > Changes since v6:
>> > clean up between 2..6 and then make your Alex's changes on 1 and 7
>> >     The mmiowb shouldn't be needed for Rx. Only one CPU will be running
>> >     NAPI for the queue and we will synchronize this with a full writel
>> >     anyway when we re-enable the interrupts.
>> >
>> > Sinan Kaya (7):
>> >   i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>> >   ixgbe: eliminate duplicate barriers on weakly-ordered archs
>> >   igbvf: eliminate duplicate barriers on weakly-ordered archs
>> >   igb: eliminate duplicate barriers on weakly-ordered archs
>> >   fm10k: Eliminate duplicate barriers on weakly-ordered archs
>> >   ixgbevf: keep writel() closer to wmb()
>> >   ixgbevf: eliminate duplicate barriers on weakly-ordered archs
>> >
>> >  drivers/net/ethernet/intel/fm10k/fm10k_main.c     |  4 ++--
>> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 14 ++++++++++----
>> >  drivers/net/ethernet/intel/i40evf/i40e_txrx.c     |  4 ++--
>> >  drivers/net/ethernet/intel/igb/igb_main.c         |  4 ++--
>> >  drivers/net/ethernet/intel/igbvf/netdev.c         |  4 ++--
>> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  8 ++++----
>> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  5 -----
>> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
>> >  8 files changed, 30 insertions(+), 24 deletions(-)
>> >
>> > --
>> > 2.7.4
>> >

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-24  2:34       ` okaya at codeaurora.org
  (?)
@ 2018-03-27 12:42         ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 12:42 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Alexander Duyck, intel-wired-lan, Netdev, Timur Tabi, sulrich,
	linux-arm-msm, linux-arm-kernel

Jeff,

On 3/23/2018 10:34 PM, okaya@codeaurora.org wrote:
> On 2018-03-23 19:58, Jeff Kirsher wrote:
>> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
>>> wrote:
>>> > Code includes wmb() followed by writel() in multiple places. writel()
>>> > already has a barrier on some architectures like arm64.
>>> >
>>> > This ends up CPU observing two barriers back to back before executing
>>> > the
>>> > register write.
>>> >
>>> > Since code already has an explicit barrier call, changing writel() to
>>> > writel_relaxed().
>>> >
>>> > I did a regex search for wmb() followed by writel() in each drivers
>>> > directory.
>>> > I scrubbed the ones I care about in this series.
>>> >
>>> > I considered "ease of change", "popular usage" and "performance
>>> > critical
>>> > path" as the determining criteria for my filtering.
>>> >
>>> > We used relaxed API heavily on ARM for a long time but
>>> > it did not exist on other architectures. For this reason, relaxed
>>> > architectures have been paying double penalty in order to use the
>>> > common
>>> > drivers.
>>> >
>>> > Now that relaxed API is present on all architectures, we can go and
>>> > scrub
>>> > all drivers to see what needs to change and what can remain.
>>> >
>>> > We start with mostly used ones and hope to increase the coverage over
>>> > time.
>>> > It will take a while to cover all drivers.
>>> >
>>> > Feel free to apply patches individually.
>>>
>>> I looked over the set and they seem good.
>>>
>>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
>> branch).  I will deal with this series in a day or two, after I have dealt
>> with my driver pull requests.
> 
> Sorry, you will have to replace the ones you took from me.

Double sorry now.

I don't know if you have been following "RFC on writel and writel_relaxed" thread
or not but there are some new developments about wmb() requirement. 

Basically, wmb() should never be used before writel() as writel() seem to
provide coherency and observability guarantee.

wmb()+writel_relaxed() is slower on some architectures than plain writel()

I'll have to rework these patches to have writel() only. 

Are you able to drop the applied ones so that I can post V8 or is it too late?


Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 12:42         ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Jeff,

On 3/23/2018 10:34 PM, okaya at codeaurora.org wrote:
> On 2018-03-23 19:58, Jeff Kirsher wrote:
>> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
>>> wrote:
>>> > Code includes wmb() followed by writel() in multiple places. writel()
>>> > already has a barrier on some architectures like arm64.
>>> >
>>> > This ends up CPU observing two barriers back to back before executing
>>> > the
>>> > register write.
>>> >
>>> > Since code already has an explicit barrier call, changing writel() to
>>> > writel_relaxed().
>>> >
>>> > I did a regex search for wmb() followed by writel() in each drivers
>>> > directory.
>>> > I scrubbed the ones I care about in this series.
>>> >
>>> > I considered "ease of change", "popular usage" and "performance
>>> > critical
>>> > path" as the determining criteria for my filtering.
>>> >
>>> > We used relaxed API heavily on ARM for a long time but
>>> > it did not exist on other architectures. For this reason, relaxed
>>> > architectures have been paying double penalty in order to use the
>>> > common
>>> > drivers.
>>> >
>>> > Now that relaxed API is present on all architectures, we can go and
>>> > scrub
>>> > all drivers to see what needs to change and what can remain.
>>> >
>>> > We start with mostly used ones and hope to increase the coverage over
>>> > time.
>>> > It will take a while to cover all drivers.
>>> >
>>> > Feel free to apply patches individually.
>>>
>>> I looked over the set and they seem good.
>>>
>>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
>> branch).? I will deal with this series in a day or two, after I have dealt
>> with my driver pull requests.
> 
> Sorry, you will have to replace the ones you took from me.

Double sorry now.

I don't know if you have been following "RFC on writel and writel_relaxed" thread
or not but there are some new developments about wmb() requirement. 

Basically, wmb() should never be used before writel() as writel() seem to
provide coherency and observability guarantee.

wmb()+writel_relaxed() is slower on some architectures than plain writel()

I'll have to rework these patches to have writel() only. 

Are you able to drop the applied ones so that I can post V8 or is it too late?


Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 12:42         ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 12:42 UTC (permalink / raw)
  To: intel-wired-lan

Jeff,

On 3/23/2018 10:34 PM, okaya at codeaurora.org wrote:
> On 2018-03-23 19:58, Jeff Kirsher wrote:
>> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
>>> wrote:
>>> > Code includes wmb() followed by writel() in multiple places. writel()
>>> > already has a barrier on some architectures like arm64.
>>> >
>>> > This ends up CPU observing two barriers back to back before executing
>>> > the
>>> > register write.
>>> >
>>> > Since code already has an explicit barrier call, changing writel() to
>>> > writel_relaxed().
>>> >
>>> > I did a regex search for wmb() followed by writel() in each drivers
>>> > directory.
>>> > I scrubbed the ones I care about in this series.
>>> >
>>> > I considered "ease of change", "popular usage" and "performance
>>> > critical
>>> > path" as the determining criteria for my filtering.
>>> >
>>> > We used relaxed API heavily on ARM for a long time but
>>> > it did not exist on other architectures. For this reason, relaxed
>>> > architectures have been paying double penalty in order to use the
>>> > common
>>> > drivers.
>>> >
>>> > Now that relaxed API is present on all architectures, we can go and
>>> > scrub
>>> > all drivers to see what needs to change and what can remain.
>>> >
>>> > We start with mostly used ones and hope to increase the coverage over
>>> > time.
>>> > It will take a while to cover all drivers.
>>> >
>>> > Feel free to apply patches individually.
>>>
>>> I looked over the set and they seem good.
>>>
>>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
>> branch).? I will deal with this series in a day or two, after I have dealt
>> with my driver pull requests.
> 
> Sorry, you will have to replace the ones you took from me.

Double sorry now.

I don't know if you have been following "RFC on writel and writel_relaxed" thread
or not but there are some new developments about wmb() requirement. 

Basically, wmb() should never be used before writel() as writel() seem to
provide coherency and observability guarantee.

wmb()+writel_relaxed() is slower on some architectures than plain writel()

I'll have to rework these patches to have writel() only. 

Are you able to drop the applied ones so that I can post V8 or is it too late?


Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-27 12:42         ` Sinan Kaya
  (?)
@ 2018-03-27 14:04           ` Lino Sanfilippo
  -1 siblings, 0 replies; 58+ messages in thread
From: Lino Sanfilippo @ 2018-03-27 14:04 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Jeff Kirsher, Alexander Duyck, intel-wired-lan, Netdev,
	Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel

Hi,

> Double sorry now.
> 
> I don't know if you have been following "RFC on writel and writel_relaxed" thread
> or not but there are some new developments about wmb() requirement. 

Just out of interest: Where can this thread be found?

> 
> Basically, wmb() should never be used before writel() as writel() seem to
> provide coherency and observability guarantee.
> 

AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
IO-memory not RAM vs. IO-memory (this may be the case for some architectures 
where the writel() implementation contains a wmb() but not for all).
For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used. 
Is this not correct? 

Regards,
Lino

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:04           ` Lino Sanfilippo
  0 siblings, 0 replies; 58+ messages in thread
From: Lino Sanfilippo @ 2018-03-27 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

> Double sorry now.
> 
> I don't know if you have been following "RFC on writel and writel_relaxed" thread
> or not but there are some new developments about wmb() requirement. 

Just out of interest: Where can this thread be found?

> 
> Basically, wmb() should never be used before writel() as writel() seem to
> provide coherency and observability guarantee.
> 

AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
IO-memory not RAM vs. IO-memory (this may be the case for some architectures 
where the writel() implementation contains a wmb() but not for all).
For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used. 
Is this not correct? 

Regards,
Lino

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:04           ` Lino Sanfilippo
  0 siblings, 0 replies; 58+ messages in thread
From: Lino Sanfilippo @ 2018-03-27 14:04 UTC (permalink / raw)
  To: intel-wired-lan

Hi,

> Double sorry now.
> 
> I don't know if you have been following "RFC on writel and writel_relaxed" thread
> or not but there are some new developments about wmb() requirement. 

Just out of interest: Where can this thread be found?

> 
> Basically, wmb() should never be used before writel() as writel() seem to
> provide coherency and observability guarantee.
> 

AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
IO-memory not RAM vs. IO-memory (this may be the case for some architectures 
where the writel() implementation contains a wmb() but not for all).
For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used. 
Is this not correct? 

Regards,
Lino

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-27 14:04           ` Lino Sanfilippo
  (?)
@ 2018-03-27 14:23             ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 14:23 UTC (permalink / raw)
  To: Lino Sanfilippo
  Cc: Jeff Kirsher, Alexander Duyck, intel-wired-lan, Netdev,
	Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel

On 3/27/2018 10:04 AM, Lino Sanfilippo wrote:
> Hi,
> 
>> Double sorry now.
>>
>> I don't know if you have been following "RFC on writel and writel_relaxed" thread
>> or not but there are some new developments about wmb() requirement. 
> 
> Just out of interest: Where can this thread be found?

https://www.spinics.net/lists/linux-rdma/msg62570.html

https://patchwork.kernel.org/patch/10309913/


> 
>>
>> Basically, wmb() should never be used before writel() as writel() seem to
>> provide coherency and observability guarantee.
>>
> 
> AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
> IO-memory not RAM vs. IO-memory (this may be the case for some architectures 
> where the writel() implementation contains a wmb() but not for all).
> For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used. 
> Is this not correct? 

We are being told that if you use writel(), then you don't need a wmb() on
all architectures.

Jason is seeking behavior clarification for write combined buffers.

> 
> Regards,
> Lino
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:23             ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 14:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/27/2018 10:04 AM, Lino Sanfilippo wrote:
> Hi,
> 
>> Double sorry now.
>>
>> I don't know if you have been following "RFC on writel and writel_relaxed" thread
>> or not but there are some new developments about wmb() requirement. 
> 
> Just out of interest: Where can this thread be found?

https://www.spinics.net/lists/linux-rdma/msg62570.html

https://patchwork.kernel.org/patch/10309913/


> 
>>
>> Basically, wmb() should never be used before writel() as writel() seem to
>> provide coherency and observability guarantee.
>>
> 
> AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
> IO-memory not RAM vs. IO-memory (this may be the case for some architectures 
> where the writel() implementation contains a wmb() but not for all).
> For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used. 
> Is this not correct? 

We are being told that if you use writel(), then you don't need a wmb() on
all architectures.

Jason is seeking behavior clarification for write combined buffers.

> 
> Regards,
> Lino
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:23             ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 14:23 UTC (permalink / raw)
  To: intel-wired-lan

On 3/27/2018 10:04 AM, Lino Sanfilippo wrote:
> Hi,
> 
>> Double sorry now.
>>
>> I don't know if you have been following "RFC on writel and writel_relaxed" thread
>> or not but there are some new developments about wmb() requirement. 
> 
> Just out of interest: Where can this thread be found?

https://www.spinics.net/lists/linux-rdma/msg62570.html

https://patchwork.kernel.org/patch/10309913/


> 
>>
>> Basically, wmb() should never be used before writel() as writel() seem to
>> provide coherency and observability guarantee.
>>
> 
> AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
> IO-memory not RAM vs. IO-memory (this may be the case for some architectures 
> where the writel() implementation contains a wmb() but not for all).
> For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used. 
> Is this not correct? 

We are being told that if you use writel(), then you don't need a wmb() on
all architectures.

Jason is seeking behavior clarification for write combined buffers.

> 
> Regards,
> Lino
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Aw: Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-27 14:23             ` Sinan Kaya
  (?)
@ 2018-03-27 14:33               ` Lino Sanfilippo
  -1 siblings, 0 replies; 58+ messages in thread
From: Lino Sanfilippo @ 2018-03-27 14:33 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Jeff Kirsher, Alexander Duyck, intel-wired-lan, Netdev,
	Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel


>
> On 3/27/2018 10:04 AM, Lino Sanfilippo wrote:
> > Hi,
> > 
> >> Double sorry now.
> >>
> >> I don't know if you have been following "RFC on writel and writel_relaxed" thread
> >> or not but there are some new developments about wmb() requirement. 
> > 
> > Just out of interest: Where can this thread be found?
> 
> https://www.spinics.net/lists/linux-rdma/msg62570.html
> 
> https://patchwork.kernel.org/patch/10309913/
> 
> 
> > 
> >>
> >> Basically, wmb() should never be used before writel() as writel() seem to
> >> provide coherency and observability guarantee.
> >>
> > 
> > AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
> > IO-memory not RAM vs. IO-memory (this may be the case for some architectures 
> > where the writel() implementation contains a wmb() but not for all).
> > For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used. 
> > Is this not correct? 
> 
> We are being told that if you use writel(), then you don't need a wmb() on
> all architectures.
> 
> Jason is seeking behavior clarification for write combined buffers.
> 

Interesting, thanks for the information!

Lino

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Aw: Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:33               ` Lino Sanfilippo
  0 siblings, 0 replies; 58+ messages in thread
From: Lino Sanfilippo @ 2018-03-27 14:33 UTC (permalink / raw)
  To: linux-arm-kernel


>
> On 3/27/2018 10:04 AM, Lino Sanfilippo wrote:
> > Hi,
> > 
> >> Double sorry now.
> >>
> >> I don't know if you have been following "RFC on writel and writel_relaxed" thread
> >> or not but there are some new developments about wmb() requirement. 
> > 
> > Just out of interest: Where can this thread be found?
> 
> https://www.spinics.net/lists/linux-rdma/msg62570.html
> 
> https://patchwork.kernel.org/patch/10309913/
> 
> 
> > 
> >>
> >> Basically, wmb() should never be used before writel() as writel() seem to
> >> provide coherency and observability guarantee.
> >>
> > 
> > AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
> > IO-memory not RAM vs. IO-memory (this may be the case for some architectures 
> > where the writel() implementation contains a wmb() but not for all).
> > For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used. 
> > Is this not correct? 
> 
> We are being told that if you use writel(), then you don't need a wmb() on
> all architectures.
> 
> Jason is seeking behavior clarification for write combined buffers.
> 

Interesting, thanks for the information!

Lino

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:33               ` Lino Sanfilippo
  0 siblings, 0 replies; 58+ messages in thread
From: Lino Sanfilippo @ 2018-03-27 14:33 UTC (permalink / raw)
  To: intel-wired-lan


>
> On 3/27/2018 10:04 AM, Lino Sanfilippo wrote:
> > Hi,
> > 
> >> Double sorry now.
> >>
> >> I don't know if you have been following "RFC on writel and writel_relaxed" thread
> >> or not but there are some new developments about wmb() requirement. 
> > 
> > Just out of interest: Where can this thread be found?
> 
> https://www.spinics.net/lists/linux-rdma/msg62570.html
> 
> https://patchwork.kernel.org/patch/10309913/
> 
> 
> > 
> >>
> >> Basically, wmb() should never be used before writel() as writel() seem to
> >> provide coherency and observability guarantee.
> >>
> > 
> > AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
> > IO-memory not RAM vs. IO-memory (this may be the case for some architectures 
> > where the writel() implementation contains a wmb() but not for all).
> > For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used. 
> > Is this not correct? 
> 
> We are being told that if you use writel(), then you don't need a wmb() on
> all architectures.
> 
> Jason is seeking behavior clarification for write combined buffers.
> 

Interesting, thanks for the information!

Lino

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-27 14:23             ` Sinan Kaya
  (?)
@ 2018-03-27 14:38               ` Alexander Duyck
  -1 siblings, 0 replies; 58+ messages in thread
From: Alexander Duyck @ 2018-03-27 14:38 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Lino Sanfilippo, Jeff Kirsher, intel-wired-lan, Netdev,
	Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel

On Tue, Mar 27, 2018 at 7:23 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/27/2018 10:04 AM, Lino Sanfilippo wrote:
>> Hi,
>>
>>> Double sorry now.
>>>
>>> I don't know if you have been following "RFC on writel and writel_relaxed" thread
>>> or not but there are some new developments about wmb() requirement.
>>
>> Just out of interest: Where can this thread be found?
>
> https://www.spinics.net/lists/linux-rdma/msg62570.html
>
> https://patchwork.kernel.org/patch/10309913/
>
>
>>
>>>
>>> Basically, wmb() should never be used before writel() as writel() seem to
>>> provide coherency and observability guarantee.
>>>
>>
>> AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
>> IO-memory not RAM vs. IO-memory (this may be the case for some architectures
>> where the writel() implementation contains a wmb() but not for all).
>> For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used.
>> Is this not correct?
>
> We are being told that if you use writel(), then you don't need a wmb() on
> all architectures.

I'm not sure who told you that but that is incorrect, at least for
x86. If you attempt to use writel() without the wmb() we will have to
NAK the patches. We will accept the wmb() with writel_releaxed() since
that solves things for ARM.

> Jason is seeking behavior clarification for write combined buffers.

Don't bother. I can tell you right now that for x86 you have to have a
wmb() before the writel().

Based on the comment in
(https://www.spinics.net/lists/linux-rdma/msg62666.html):
    Replacing wmb() + writel() with wmb() + writel_relaxed() will work on
    PPC, it will just not give you a benefit today.

I say the patch set stays. This gives benefit on ARM, and has no
effect on x86 and PowerPC. If you want to look at trying to optimize
things further on PowerPC and such then go for it in terms of trying
to implement the writel_relaxed(). Otherwise I say we call the ARM
goodness a win and don't get ourselves too wrapped up in trying to fix
this for all architectures.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:38               ` Alexander Duyck
  0 siblings, 0 replies; 58+ messages in thread
From: Alexander Duyck @ 2018-03-27 14:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 27, 2018 at 7:23 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/27/2018 10:04 AM, Lino Sanfilippo wrote:
>> Hi,
>>
>>> Double sorry now.
>>>
>>> I don't know if you have been following "RFC on writel and writel_relaxed" thread
>>> or not but there are some new developments about wmb() requirement.
>>
>> Just out of interest: Where can this thread be found?
>
> https://www.spinics.net/lists/linux-rdma/msg62570.html
>
> https://patchwork.kernel.org/patch/10309913/
>
>
>>
>>>
>>> Basically, wmb() should never be used before writel() as writel() seem to
>>> provide coherency and observability guarantee.
>>>
>>
>> AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
>> IO-memory not RAM vs. IO-memory (this may be the case for some architectures
>> where the writel() implementation contains a wmb() but not for all).
>> For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used.
>> Is this not correct?
>
> We are being told that if you use writel(), then you don't need a wmb() on
> all architectures.

I'm not sure who told you that but that is incorrect, at least for
x86. If you attempt to use writel() without the wmb() we will have to
NAK the patches. We will accept the wmb() with writel_releaxed() since
that solves things for ARM.

> Jason is seeking behavior clarification for write combined buffers.

Don't bother. I can tell you right now that for x86 you have to have a
wmb() before the writel().

Based on the comment in
(https://www.spinics.net/lists/linux-rdma/msg62666.html):
    Replacing wmb() + writel() with wmb() + writel_relaxed() will work on
    PPC, it will just not give you a benefit today.

I say the patch set stays. This gives benefit on ARM, and has no
effect on x86 and PowerPC. If you want to look at trying to optimize
things further on PowerPC and such then go for it in terms of trying
to implement the writel_relaxed(). Otherwise I say we call the ARM
goodness a win and don't get ourselves too wrapped up in trying to fix
this for all architectures.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:38               ` Alexander Duyck
  0 siblings, 0 replies; 58+ messages in thread
From: Alexander Duyck @ 2018-03-27 14:38 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Mar 27, 2018 at 7:23 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/27/2018 10:04 AM, Lino Sanfilippo wrote:
>> Hi,
>>
>>> Double sorry now.
>>>
>>> I don't know if you have been following "RFC on writel and writel_relaxed" thread
>>> or not but there are some new developments about wmb() requirement.
>>
>> Just out of interest: Where can this thread be found?
>
> https://www.spinics.net/lists/linux-rdma/msg62570.html
>
> https://patchwork.kernel.org/patch/10309913/
>
>
>>
>>>
>>> Basically, wmb() should never be used before writel() as writel() seem to
>>> provide coherency and observability guarantee.
>>>
>>
>> AFAIU memory-barriers.txt writel() only guarantees correct order of accesses to
>> IO-memory not RAM vs. IO-memory (this may be the case for some architectures
>> where the writel() implementation contains a wmb() but not for all).
>> For the RAM vs. IO-memory case at least a a wmb()/rmb() has to be used.
>> Is this not correct?
>
> We are being told that if you use writel(), then you don't need a wmb() on
> all architectures.

I'm not sure who told you that but that is incorrect, at least for
x86. If you attempt to use writel() without the wmb() we will have to
NAK the patches. We will accept the wmb() with writel_releaxed() since
that solves things for ARM.

> Jason is seeking behavior clarification for write combined buffers.

Don't bother. I can tell you right now that for x86 you have to have a
wmb() before the writel().

Based on the comment in
(https://www.spinics.net/lists/linux-rdma/msg62666.html):
    Replacing wmb() + writel() with wmb() + writel_relaxed() will work on
    PPC, it will just not give you a benefit today.

I say the patch set stays. This gives benefit on ARM, and has no
effect on x86 and PowerPC. If you want to look at trying to optimize
things further on PowerPC and such then go for it in terms of trying
to implement the writel_relaxed(). Otherwise I say we call the ARM
goodness a win and don't get ourselves too wrapped up in trying to fix
this for all architectures.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-27 14:38               ` Alexander Duyck
  (?)
@ 2018-03-27 14:48                 ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 14:48 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Lino Sanfilippo, Jeff Kirsher, intel-wired-lan, Netdev,
	Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel

On 3/27/2018 10:38 AM, Alexander Duyck wrote:
>> We are being told that if you use writel(), then you don't need a wmb() on
>> all architectures.
> I'm not sure who told you that but that is incorrect, at least for
> x86. If you attempt to use writel() without the wmb() we will have to
> NAK the patches. We will accept the wmb() with writel_releaxed() since
> that solves things for ARM.
> 

I added netdev and you to the RFC on writel and writel_relaxed list. 
Feel free to raise your concerns.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:48                 ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/27/2018 10:38 AM, Alexander Duyck wrote:
>> We are being told that if you use writel(), then you don't need a wmb() on
>> all architectures.
> I'm not sure who told you that but that is incorrect, at least for
> x86. If you attempt to use writel() without the wmb() we will have to
> NAK the patches. We will accept the wmb() with writel_releaxed() since
> that solves things for ARM.
> 

I added netdev and you to the RFC on writel and writel_relaxed list. 
Feel free to raise your concerns.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 14:48                 ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 14:48 UTC (permalink / raw)
  To: intel-wired-lan

On 3/27/2018 10:38 AM, Alexander Duyck wrote:
>> We are being told that if you use writel(), then you don't need a wmb() on
>> all architectures.
> I'm not sure who told you that but that is incorrect, at least for
> x86. If you attempt to use writel() without the wmb() we will have to
> NAK the patches. We will accept the wmb() with writel_releaxed() since
> that solves things for ARM.
> 

I added netdev and you to the RFC on writel and writel_relaxed list. 
Feel free to raise your concerns.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-27 12:42         ` Sinan Kaya
  (?)
@ 2018-03-27 16:54           ` Jeff Kirsher
  -1 siblings, 0 replies; 58+ messages in thread
From: Jeff Kirsher @ 2018-03-27 16:54 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: sulrich, Netdev, Timur Tabi, Alexander Duyck, intel-wired-lan,
	linux-arm-msm, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 3057 bytes --]

On Tue, 2018-03-27 at 08:42 -0400, Sinan Kaya wrote:
> On 3/23/2018 10:34 PM, okaya@codeaurora.org wrote:
> > On 2018-03-23 19:58, Jeff Kirsher wrote:
> > > On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
> > > > On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.
> > > > org>
> > > > wrote:
> > > > > Code includes wmb() followed by writel() in multiple places.
> > > > > writel()
> > > > > already has a barrier on some architectures like arm64.
> > > > > 
> > > > > This ends up CPU observing two barriers back to back before
> > > > > executing
> > > > > the
> > > > > register write.
> > > > > 
> > > > > Since code already has an explicit barrier call, changing
> > > > > writel() to
> > > > > writel_relaxed().
> > > > > 
> > > > > I did a regex search for wmb() followed by writel() in each
> > > > > drivers
> > > > > directory.
> > > > > I scrubbed the ones I care about in this series.
> > > > > 
> > > > > I considered "ease of change", "popular usage" and
> > > > > "performance
> > > > > critical
> > > > > path" as the determining criteria for my filtering.
> > > > > 
> > > > > We used relaxed API heavily on ARM for a long time but
> > > > > it did not exist on other architectures. For this reason,
> > > > > relaxed
> > > > > architectures have been paying double penalty in order to use
> > > > > the
> > > > > common
> > > > > drivers.
> > > > > 
> > > > > Now that relaxed API is present on all architectures, we can
> > > > > go and
> > > > > scrub
> > > > > all drivers to see what needs to change and what can remain.
> > > > > 
> > > > > We start with mostly used ones and hope to increase the
> > > > > coverage over
> > > > > time.
> > > > > It will take a while to cover all drivers.
> > > > > 
> > > > > Feel free to apply patches individually.
> > > > 
> > > > I looked over the set and they seem good.
> > > > 
> > > > Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > > 
> > > Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-
> > > queue
> > > branch).  I will deal with this series in a day or two, after I
> > > have dealt
> > > with my driver pull requests.
> > 
> > Sorry, you will have to replace the ones you took from me.
> 
> Double sorry now.
> 
> I don't know if you have been following "RFC on writel and
> writel_relaxed" thread
> or not but there are some new developments about wmb() requirement. 
> 
> Basically, wmb() should never be used before writel() as writel()
> seem to
> provide coherency and observability guarantee.
> 
> wmb()+writel_relaxed() is slower on some architectures than plain
> writel()
> 
> I'll have to rework these patches to have writel() only. 
> 
> Are you able to drop the applied ones so that I can post V8 or is it
> too late?

Currently I do not have any of your patches applied to my next-queue
tree (dev-queue branch).  So feel free to do any revisions you need to
do and to re-submit to intel-wired-lan@lists.osuosl.org (IWL) mailing
list.

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 16:54           ` Jeff Kirsher
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff Kirsher @ 2018-03-27 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2018-03-27 at 08:42 -0400, Sinan Kaya wrote:
> On 3/23/2018 10:34 PM, okaya at codeaurora.org wrote:
> > On 2018-03-23 19:58, Jeff Kirsher wrote:
> > > On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
> > > > On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.
> > > > org>
> > > > wrote:
> > > > > Code includes wmb() followed by writel() in multiple places.
> > > > > writel()
> > > > > already has a barrier on some architectures like arm64.
> > > > > 
> > > > > This ends up CPU observing two barriers back to back before
> > > > > executing
> > > > > the
> > > > > register write.
> > > > > 
> > > > > Since code already has an explicit barrier call, changing
> > > > > writel() to
> > > > > writel_relaxed().
> > > > > 
> > > > > I did a regex search for wmb() followed by writel() in each
> > > > > drivers
> > > > > directory.
> > > > > I scrubbed the ones I care about in this series.
> > > > > 
> > > > > I considered "ease of change", "popular usage" and
> > > > > "performance
> > > > > critical
> > > > > path" as the determining criteria for my filtering.
> > > > > 
> > > > > We used relaxed API heavily on ARM for a long time but
> > > > > it did not exist on other architectures. For this reason,
> > > > > relaxed
> > > > > architectures have been paying double penalty in order to use
> > > > > the
> > > > > common
> > > > > drivers.
> > > > > 
> > > > > Now that relaxed API is present on all architectures, we can
> > > > > go and
> > > > > scrub
> > > > > all drivers to see what needs to change and what can remain.
> > > > > 
> > > > > We start with mostly used ones and hope to increase the
> > > > > coverage over
> > > > > time.
> > > > > It will take a while to cover all drivers.
> > > > > 
> > > > > Feel free to apply patches individually.
> > > > 
> > > > I looked over the set and they seem good.
> > > > 
> > > > Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > > 
> > > Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-
> > > queue
> > > branch).  I will deal with this series in a day or two, after I
> > > have dealt
> > > with my driver pull requests.
> > 
> > Sorry, you will have to replace the ones you took from me.
> 
> Double sorry now.
> 
> I don't know if you have been following "RFC on writel and
> writel_relaxed" thread
> or not but there are some new developments about wmb() requirement. 
> 
> Basically, wmb() should never be used before writel() as writel()
> seem to
> provide coherency and observability guarantee.
> 
> wmb()+writel_relaxed() is slower on some architectures than plain
> writel()
> 
> I'll have to rework these patches to have writel() only. 
> 
> Are you able to drop the applied ones so that I can post V8 or is it
> too late?

Currently I do not have any of your patches applied to my next-queue
tree (dev-queue branch).  So feel free to do any revisions you need to
do and to re-submit to intel-wired-lan at lists.osuosl.org (IWL) mailing
list.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180327/a20aa26b/attachment.sig>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 16:54           ` Jeff Kirsher
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff Kirsher @ 2018-03-27 16:54 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, 2018-03-27 at 08:42 -0400, Sinan Kaya wrote:
> On 3/23/2018 10:34 PM, okaya at codeaurora.org wrote:
> > On 2018-03-23 19:58, Jeff Kirsher wrote:
> > > On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
> > > > On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.
> > > > org>
> > > > wrote:
> > > > > Code includes wmb() followed by writel() in multiple places.
> > > > > writel()
> > > > > already has a barrier on some architectures like arm64.
> > > > > 
> > > > > This ends up CPU observing two barriers back to back before
> > > > > executing
> > > > > the
> > > > > register write.
> > > > > 
> > > > > Since code already has an explicit barrier call, changing
> > > > > writel() to
> > > > > writel_relaxed().
> > > > > 
> > > > > I did a regex search for wmb() followed by writel() in each
> > > > > drivers
> > > > > directory.
> > > > > I scrubbed the ones I care about in this series.
> > > > > 
> > > > > I considered "ease of change", "popular usage" and
> > > > > "performance
> > > > > critical
> > > > > path" as the determining criteria for my filtering.
> > > > > 
> > > > > We used relaxed API heavily on ARM for a long time but
> > > > > it did not exist on other architectures. For this reason,
> > > > > relaxed
> > > > > architectures have been paying double penalty in order to use
> > > > > the
> > > > > common
> > > > > drivers.
> > > > > 
> > > > > Now that relaxed API is present on all architectures, we can
> > > > > go and
> > > > > scrub
> > > > > all drivers to see what needs to change and what can remain.
> > > > > 
> > > > > We start with mostly used ones and hope to increase the
> > > > > coverage over
> > > > > time.
> > > > > It will take a while to cover all drivers.
> > > > > 
> > > > > Feel free to apply patches individually.
> > > > 
> > > > I looked over the set and they seem good.
> > > > 
> > > > Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > > 
> > > Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-
> > > queue
> > > branch).  I will deal with this series in a day or two, after I
> > > have dealt
> > > with my driver pull requests.
> > 
> > Sorry, you will have to replace the ones you took from me.
> 
> Double sorry now.
> 
> I don't know if you have been following "RFC on writel and
> writel_relaxed" thread
> or not but there are some new developments about wmb() requirement. 
> 
> Basically, wmb() should never be used before writel() as writel()
> seem to
> provide coherency and observability guarantee.
> 
> wmb()+writel_relaxed() is slower on some architectures than plain
> writel()
> 
> I'll have to rework these patches to have writel() only. 
> 
> Are you able to drop the applied ones so that I can post V8 or is it
> too late?

Currently I do not have any of your patches applied to my next-queue
tree (dev-queue branch).  So feel free to do any revisions you need to
do and to re-submit to intel-wired-lan at lists.osuosl.org (IWL) mailing
list.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20180327/a20aa26b/attachment.asc>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
  2018-03-27 16:54           ` Jeff Kirsher
  (?)
@ 2018-03-27 17:33             ` Sinan Kaya
  -1 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 17:33 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: Alexander Duyck, intel-wired-lan, Netdev, Timur Tabi, sulrich,
	linux-arm-msm, linux-arm-kernel

On 3/27/2018 12:54 PM, Jeff Kirsher wrote:
> On Tue, 2018-03-27 at 08:42 -0400, Sinan Kaya wrote:
>> On 3/23/2018 10:34 PM, okaya@codeaurora.org wrote:
>>> On 2018-03-23 19:58, Jeff Kirsher wrote:
>>>> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>>>>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.
>>>>> org>
>>>>> wrote:
>>>>>> Code includes wmb() followed by writel() in multiple places.
>>>>>> writel()
>>>>>> already has a barrier on some architectures like arm64.
>>>>>>
>>>>>> This ends up CPU observing two barriers back to back before
>>>>>> executing
>>>>>> the
>>>>>> register write.
>>>>>>
>>>>>> Since code already has an explicit barrier call, changing
>>>>>> writel() to
>>>>>> writel_relaxed().
>>>>>>
>>>>>> I did a regex search for wmb() followed by writel() in each
>>>>>> drivers
>>>>>> directory.
>>>>>> I scrubbed the ones I care about in this series.
>>>>>>
>>>>>> I considered "ease of change", "popular usage" and
>>>>>> "performance
>>>>>> critical
>>>>>> path" as the determining criteria for my filtering.
>>>>>>
>>>>>> We used relaxed API heavily on ARM for a long time but
>>>>>> it did not exist on other architectures. For this reason,
>>>>>> relaxed
>>>>>> architectures have been paying double penalty in order to use
>>>>>> the
>>>>>> common
>>>>>> drivers.
>>>>>>
>>>>>> Now that relaxed API is present on all architectures, we can
>>>>>> go and
>>>>>> scrub
>>>>>> all drivers to see what needs to change and what can remain.
>>>>>>
>>>>>> We start with mostly used ones and hope to increase the
>>>>>> coverage over
>>>>>> time.
>>>>>> It will take a while to cover all drivers.
>>>>>>
>>>>>> Feel free to apply patches individually.
>>>>>
>>>>> I looked over the set and they seem good.
>>>>>
>>>>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>>>
>>>> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-
>>>> queue
>>>> branch).  I will deal with this series in a day or two, after I
>>>> have dealt
>>>> with my driver pull requests.
>>>
>>> Sorry, you will have to replace the ones you took from me.
>>
>> Double sorry now.
>>
>> I don't know if you have been following "RFC on writel and
>> writel_relaxed" thread
>> or not but there are some new developments about wmb() requirement. 
>>
>> Basically, wmb() should never be used before writel() as writel()
>> seem to
>> provide coherency and observability guarantee.
>>
>> wmb()+writel_relaxed() is slower on some architectures than plain
>> writel()
>>
>> I'll have to rework these patches to have writel() only. 
>>
>> Are you able to drop the applied ones so that I can post V8 or is it
>> too late?
> 
> Currently I do not have any of your patches applied to my next-queue
> tree (dev-queue branch).  So feel free to do any revisions you need to
> do and to re-submit to intel-wired-lan@lists.osuosl.org (IWL) mailing
> list.
> 

Thanks, Good to know. 

I'm waiting for the discussion to settle. I'll update as soon as I get
a clear direction.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 17:33             ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 17:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/27/2018 12:54 PM, Jeff Kirsher wrote:
> On Tue, 2018-03-27 at 08:42 -0400, Sinan Kaya wrote:
>> On 3/23/2018 10:34 PM, okaya at codeaurora.org wrote:
>>> On 2018-03-23 19:58, Jeff Kirsher wrote:
>>>> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>>>>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.
>>>>> org>
>>>>> wrote:
>>>>>> Code includes wmb() followed by writel() in multiple places.
>>>>>> writel()
>>>>>> already has a barrier on some architectures like arm64.
>>>>>>
>>>>>> This ends up CPU observing two barriers back to back before
>>>>>> executing
>>>>>> the
>>>>>> register write.
>>>>>>
>>>>>> Since code already has an explicit barrier call, changing
>>>>>> writel() to
>>>>>> writel_relaxed().
>>>>>>
>>>>>> I did a regex search for wmb() followed by writel() in each
>>>>>> drivers
>>>>>> directory.
>>>>>> I scrubbed the ones I care about in this series.
>>>>>>
>>>>>> I considered "ease of change", "popular usage" and
>>>>>> "performance
>>>>>> critical
>>>>>> path" as the determining criteria for my filtering.
>>>>>>
>>>>>> We used relaxed API heavily on ARM for a long time but
>>>>>> it did not exist on other architectures. For this reason,
>>>>>> relaxed
>>>>>> architectures have been paying double penalty in order to use
>>>>>> the
>>>>>> common
>>>>>> drivers.
>>>>>>
>>>>>> Now that relaxed API is present on all architectures, we can
>>>>>> go and
>>>>>> scrub
>>>>>> all drivers to see what needs to change and what can remain.
>>>>>>
>>>>>> We start with mostly used ones and hope to increase the
>>>>>> coverage over
>>>>>> time.
>>>>>> It will take a while to cover all drivers.
>>>>>>
>>>>>> Feel free to apply patches individually.
>>>>>
>>>>> I looked over the set and they seem good.
>>>>>
>>>>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>>>
>>>> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-
>>>> queue
>>>> branch).  I will deal with this series in a day or two, after I
>>>> have dealt
>>>> with my driver pull requests.
>>>
>>> Sorry, you will have to replace the ones you took from me.
>>
>> Double sorry now.
>>
>> I don't know if you have been following "RFC on writel and
>> writel_relaxed" thread
>> or not but there are some new developments about wmb() requirement. 
>>
>> Basically, wmb() should never be used before writel() as writel()
>> seem to
>> provide coherency and observability guarantee.
>>
>> wmb()+writel_relaxed() is slower on some architectures than plain
>> writel()
>>
>> I'll have to rework these patches to have writel() only. 
>>
>> Are you able to drop the applied ones so that I can post V8 or is it
>> too late?
> 
> Currently I do not have any of your patches applied to my next-queue
> tree (dev-queue branch).  So feel free to do any revisions you need to
> do and to re-submit to intel-wired-lan at lists.osuosl.org (IWL) mailing
> list.
> 

Thanks, Good to know. 

I'm waiting for the discussion to settle. I'll update as soon as I get
a clear direction.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Intel-wired-lan] [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
@ 2018-03-27 17:33             ` Sinan Kaya
  0 siblings, 0 replies; 58+ messages in thread
From: Sinan Kaya @ 2018-03-27 17:33 UTC (permalink / raw)
  To: intel-wired-lan

On 3/27/2018 12:54 PM, Jeff Kirsher wrote:
> On Tue, 2018-03-27 at 08:42 -0400, Sinan Kaya wrote:
>> On 3/23/2018 10:34 PM, okaya at codeaurora.org wrote:
>>> On 2018-03-23 19:58, Jeff Kirsher wrote:
>>>> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>>>>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.
>>>>> org>
>>>>> wrote:
>>>>>> Code includes wmb() followed by writel() in multiple places.
>>>>>> writel()
>>>>>> already has a barrier on some architectures like arm64.
>>>>>>
>>>>>> This ends up CPU observing two barriers back to back before
>>>>>> executing
>>>>>> the
>>>>>> register write.
>>>>>>
>>>>>> Since code already has an explicit barrier call, changing
>>>>>> writel() to
>>>>>> writel_relaxed().
>>>>>>
>>>>>> I did a regex search for wmb() followed by writel() in each
>>>>>> drivers
>>>>>> directory.
>>>>>> I scrubbed the ones I care about in this series.
>>>>>>
>>>>>> I considered "ease of change", "popular usage" and
>>>>>> "performance
>>>>>> critical
>>>>>> path" as the determining criteria for my filtering.
>>>>>>
>>>>>> We used relaxed API heavily on ARM for a long time but
>>>>>> it did not exist on other architectures. For this reason,
>>>>>> relaxed
>>>>>> architectures have been paying double penalty in order to use
>>>>>> the
>>>>>> common
>>>>>> drivers.
>>>>>>
>>>>>> Now that relaxed API is present on all architectures, we can
>>>>>> go and
>>>>>> scrub
>>>>>> all drivers to see what needs to change and what can remain.
>>>>>>
>>>>>> We start with mostly used ones and hope to increase the
>>>>>> coverage over
>>>>>> time.
>>>>>> It will take a while to cover all drivers.
>>>>>>
>>>>>> Feel free to apply patches individually.
>>>>>
>>>>> I looked over the set and they seem good.
>>>>>
>>>>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>>>
>>>> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-
>>>> queue
>>>> branch).  I will deal with this series in a day or two, after I
>>>> have dealt
>>>> with my driver pull requests.
>>>
>>> Sorry, you will have to replace the ones you took from me.
>>
>> Double sorry now.
>>
>> I don't know if you have been following "RFC on writel and
>> writel_relaxed" thread
>> or not but there are some new developments about wmb() requirement. 
>>
>> Basically, wmb() should never be used before writel() as writel()
>> seem to
>> provide coherency and observability guarantee.
>>
>> wmb()+writel_relaxed() is slower on some architectures than plain
>> writel()
>>
>> I'll have to rework these patches to have writel() only. 
>>
>> Are you able to drop the applied ones so that I can post V8 or is it
>> too late?
> 
> Currently I do not have any of your patches applied to my next-queue
> tree (dev-queue branch).  So feel free to do any revisions you need to
> do and to re-submit to intel-wired-lan at lists.osuosl.org (IWL) mailing
> list.
> 

Thanks, Good to know. 

I'm waiting for the discussion to settle. I'll update as soon as I get
a clear direction.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2018-03-27 17:33 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-23 18:52 [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
2018-03-23 18:52 ` Sinan Kaya
2018-03-23 18:52 ` [PATCH v7 1/7] i40e/i40evf: " Sinan Kaya
2018-03-23 18:52   ` [Intel-wired-lan] " Sinan Kaya
2018-03-23 18:52   ` Sinan Kaya
2018-03-23 18:52 ` [PATCH v7 2/7] ixgbe: eliminate " Sinan Kaya
2018-03-23 18:52   ` [Intel-wired-lan] " Sinan Kaya
2018-03-23 18:52   ` Sinan Kaya
2018-03-23 18:52   ` Sinan Kaya
2018-03-23 18:52 ` [PATCH v7 3/7] igbvf: " Sinan Kaya
2018-03-23 18:52   ` [Intel-wired-lan] " Sinan Kaya
2018-03-23 18:52   ` Sinan Kaya
2018-03-23 18:52   ` Sinan Kaya
2018-03-23 18:52 ` [PATCH v7 4/7] igb: " Sinan Kaya
2018-03-23 18:52   ` [Intel-wired-lan] " Sinan Kaya
2018-03-23 18:52   ` Sinan Kaya
2018-03-23 18:52 ` [PATCH v7 5/7] fm10k: Eliminate " Sinan Kaya
2018-03-23 18:52   ` [Intel-wired-lan] " Sinan Kaya
2018-03-23 18:52   ` Sinan Kaya
2018-03-23 18:52 ` [PATCH v7 6/7] ixgbevf: keep writel() closer to wmb() Sinan Kaya
2018-03-23 18:52   ` [Intel-wired-lan] " Sinan Kaya
2018-03-23 18:52   ` Sinan Kaya
2018-03-23 18:53 ` [PATCH v7 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs Sinan Kaya
2018-03-23 18:53   ` [Intel-wired-lan] " Sinan Kaya
2018-03-23 18:53   ` Sinan Kaya
2018-03-23 21:53 ` [PATCH v7 0/7] netdev: intel: Eliminate " Alexander Duyck
2018-03-23 21:53   ` [Intel-wired-lan] " Alexander Duyck
2018-03-23 21:53   ` Alexander Duyck
2018-03-23 23:58   ` Jeff Kirsher
2018-03-23 23:58     ` [Intel-wired-lan] " Jeff Kirsher
2018-03-23 23:58     ` Jeff Kirsher
2018-03-24  2:34     ` okaya
2018-03-24  2:34       ` [Intel-wired-lan] " okaya
2018-03-24  2:34       ` okaya at codeaurora.org
2018-03-27 12:42       ` Sinan Kaya
2018-03-27 12:42         ` [Intel-wired-lan] " Sinan Kaya
2018-03-27 12:42         ` Sinan Kaya
2018-03-27 14:04         ` Lino Sanfilippo
2018-03-27 14:04           ` [Intel-wired-lan] " Lino Sanfilippo
2018-03-27 14:04           ` Lino Sanfilippo
2018-03-27 14:23           ` Sinan Kaya
2018-03-27 14:23             ` [Intel-wired-lan] " Sinan Kaya
2018-03-27 14:23             ` Sinan Kaya
2018-03-27 14:33             ` Aw: " Lino Sanfilippo
2018-03-27 14:33               ` [Intel-wired-lan] " Lino Sanfilippo
2018-03-27 14:33               ` Aw: " Lino Sanfilippo
2018-03-27 14:38             ` Alexander Duyck
2018-03-27 14:38               ` [Intel-wired-lan] " Alexander Duyck
2018-03-27 14:38               ` Alexander Duyck
2018-03-27 14:48               ` Sinan Kaya
2018-03-27 14:48                 ` [Intel-wired-lan] " Sinan Kaya
2018-03-27 14:48                 ` Sinan Kaya
2018-03-27 16:54         ` Jeff Kirsher
2018-03-27 16:54           ` [Intel-wired-lan] " Jeff Kirsher
2018-03-27 16:54           ` Jeff Kirsher
2018-03-27 17:33           ` Sinan Kaya
2018-03-27 17:33             ` [Intel-wired-lan] " Sinan Kaya
2018-03-27 17:33             ` Sinan Kaya

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.