netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v4 0/7] net: lockless stop/wake combo macros
@ 2023-04-07  1:25 Jakub Kicinski
  2023-04-07  1:25 ` [PATCH net-next v4 1/7] docs: net: reformat driver.rst from a list to sections Jakub Kicinski
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Jakub Kicinski @ 2023-04-07  1:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

A lot of drivers follow the same scheme to stop / start queues
without introducing locks between xmit and NAPI tx completions.
I'm guessing they all copy'n'paste each other's code.
The original code dates back all the way to e1000 and Linux 2.6.19.

v4:
 - adjust a comment in patch 4
 - use IS_ENABLED() in patch 7
v3: https://lore.kernel.org/all/20230405223134.94665-1-kuba@kernel.org/
 - render the info as part of documentation, maybe someone will
   notice and use it (patches 1, 2, 3 are new)
 - use the __after_atomic barrier
 - add last patch to avoid a barrier in the wake path
more detailed change logs in the patches.

v2: https://lore.kernel.org/all/20230401051221.3160913-2-kuba@kernel.org/
 - really flip the unlikely into a likely in __netif_tx_queue_maybe_wake()
 - convert if / else into pre-init of _ret
v1: https://lore.kernel.org/all/20230322233028.269410-1-kuba@kernel.org/
 - perdicate -> predicate
 - on race use start instead of wake and make a note of that
   in the doc / comment at the start
rfc: https://lore.kernel.org/all/20230311050130.115138-1-kuba@kernel.org/

Jakub Kicinski (7):
  docs: net: reformat driver.rst from a list to sections
  docs: net: move the probe and open/close sections of driver.rst up
  docs: net: use C syntax highlight in driver.rst
  net: provide macros for commonly copied lockless queue stop/wake code
  ixgbe: use new queue try_stop/try_wake macros
  bnxt: use new queue try_stop/try_wake macros
  net: piggy back on the memory barrier in bql when waking queues

 Documentation/networking/driver.rst           | 119 ++++++++-----
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  42 +----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  42 ++---
 include/linux/netdevice.h                     |   3 +-
 include/net/netdev_queues.h                   | 163 ++++++++++++++++++
 5 files changed, 262 insertions(+), 107 deletions(-)
 create mode 100644 include/net/netdev_queues.h

-- 
2.39.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 1/7] docs: net: reformat driver.rst from a list to sections
  2023-04-07  1:25 [PATCH net-next v4 0/7] net: lockless stop/wake combo macros Jakub Kicinski
@ 2023-04-07  1:25 ` Jakub Kicinski
  2023-04-07  1:25 ` [PATCH net-next v4 2/7] docs: net: move the probe and open/close sections of driver.rst up Jakub Kicinski
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2023-04-07  1:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

driver.rst had a historical form of list of common problems.
In the age os Sphinx and rendered documentation it's better
to use the more usual title + text format.

This will allow us to render kdoc into the output more naturally.

No changes to the actual text.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/driver.rst | 91 ++++++++++++++++++-----------
 1 file changed, 56 insertions(+), 35 deletions(-)

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index 64f7236ff10b..3040a74d421c 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -4,15 +4,19 @@
 Softnet Driver Issues
 =====================
 
-Transmit path guidelines:
+Transmit path guidelines
+========================
 
-1) The ndo_start_xmit method must not return NETDEV_TX_BUSY under
-   any normal circumstances.  It is considered a hard error unless
-   there is no way your device can tell ahead of time when its
-   transmit function will become busy.
+Stop queues in advance
+----------------------
 
-   Instead it must maintain the queue properly.  For example,
-   for a driver implementing scatter-gather this means::
+The ndo_start_xmit method must not return NETDEV_TX_BUSY under
+any normal circumstances.  It is considered a hard error unless
+there is no way your device can tell ahead of time when its
+transmit function will become busy.
+
+Instead it must maintain the queue properly.  For example,
+for a driver implementing scatter-gather this means::
 
 	static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
 					       struct net_device *dev)
@@ -42,56 +46,73 @@ Softnet Driver Issues
 		return NETDEV_TX_OK;
 	}
 
-   And then at the end of your TX reclamation event handling::
+And then at the end of your TX reclamation event handling::
 
 	if (netif_queue_stopped(dp->dev) &&
 	    TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
 		netif_wake_queue(dp->dev);
 
-   For a non-scatter-gather supporting card, the three tests simply become::
+For a non-scatter-gather supporting card, the three tests simply become::
 
 		/* This is a hard error log it. */
 		if (TX_BUFFS_AVAIL(dp) <= 0)
 
-   and::
+and::
 
 		if (TX_BUFFS_AVAIL(dp) == 0)
 
-   and::
+and::
 
 	if (netif_queue_stopped(dp->dev) &&
 	    TX_BUFFS_AVAIL(dp) > 0)
 		netif_wake_queue(dp->dev);
 
-2) An ndo_start_xmit method must not modify the shared parts of a
-   cloned SKB.
+No exclusive ownership
+----------------------
+
+An ndo_start_xmit method must not modify the shared parts of a
+cloned SKB.
+
+Timely completions
+------------------
+
+Do not forget that once you return NETDEV_TX_OK from your
+ndo_start_xmit method, it is your driver's responsibility to free
+up the SKB and in some finite amount of time.
 
-3) Do not forget that once you return NETDEV_TX_OK from your
-   ndo_start_xmit method, it is your driver's responsibility to free
-   up the SKB and in some finite amount of time.
+For example, this means that it is not allowed for your TX
+mitigation scheme to let TX packets "hang out" in the TX
+ring unreclaimed forever if no new TX packets are sent.
+This error can deadlock sockets waiting for send buffer room
+to be freed up.
 
-   For example, this means that it is not allowed for your TX
-   mitigation scheme to let TX packets "hang out" in the TX
-   ring unreclaimed forever if no new TX packets are sent.
-   This error can deadlock sockets waiting for send buffer room
-   to be freed up.
+If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
+must not keep any reference to that SKB and you must not attempt
+to free it up.
 
-   If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
-   must not keep any reference to that SKB and you must not attempt
-   to free it up.
+Probing guidelines
+==================
 
-Probing guidelines:
+Address validation
+------------------
+
+Any hardware layer address you obtain for your device should
+be verified.  For example, for ethernet check it with
+linux/etherdevice.h:is_valid_ether_addr()
+
+Close/stop guidelines
+=====================
 
-1) Any hardware layer address you obtain for your device should
-   be verified.  For example, for ethernet check it with
-   linux/etherdevice.h:is_valid_ether_addr()
+Quiescence
+----------
 
-Close/stop guidelines:
+After the ndo_stop routine has been called, the hardware must
+not receive or transmit any data.  All in flight packets must
+be aborted. If necessary, poll or wait for completion of
+any reset commands.
 
-1) After the ndo_stop routine has been called, the hardware must
-   not receive or transmit any data.  All in flight packets must
-   be aborted. If necessary, poll or wait for completion of
-   any reset commands.
+Auto-close
+----------
 
-2) The ndo_stop routine will be called by unregister_netdevice
-   if device is still UP.
+The ndo_stop routine will be called by unregister_netdevice
+if device is still UP.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 2/7] docs: net: move the probe and open/close sections of driver.rst up
  2023-04-07  1:25 [PATCH net-next v4 0/7] net: lockless stop/wake combo macros Jakub Kicinski
  2023-04-07  1:25 ` [PATCH net-next v4 1/7] docs: net: reformat driver.rst from a list to sections Jakub Kicinski
@ 2023-04-07  1:25 ` Jakub Kicinski
  2023-04-07  1:25 ` [PATCH net-next v4 3/7] docs: net: use C syntax highlight in driver.rst Jakub Kicinski
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2023-04-07  1:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

Somehow it feels more right to start from the probe then open,
then tx... Much like the lifetime of the driver itself.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/driver.rst | 54 ++++++++++++++---------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index 3040a74d421c..bfbd66871bb3 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -4,6 +4,33 @@
 Softnet Driver Issues
 =====================
 
+Probing guidelines
+==================
+
+Address validation
+------------------
+
+Any hardware layer address you obtain for your device should
+be verified.  For example, for ethernet check it with
+linux/etherdevice.h:is_valid_ether_addr()
+
+Close/stop guidelines
+=====================
+
+Quiescence
+----------
+
+After the ndo_stop routine has been called, the hardware must
+not receive or transmit any data.  All in flight packets must
+be aborted. If necessary, poll or wait for completion of
+any reset commands.
+
+Auto-close
+----------
+
+The ndo_stop routine will be called by unregister_netdevice
+if device is still UP.
+
 Transmit path guidelines
 ========================
 
@@ -89,30 +116,3 @@ to be freed up.
 If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
 must not keep any reference to that SKB and you must not attempt
 to free it up.
-
-Probing guidelines
-==================
-
-Address validation
-------------------
-
-Any hardware layer address you obtain for your device should
-be verified.  For example, for ethernet check it with
-linux/etherdevice.h:is_valid_ether_addr()
-
-Close/stop guidelines
-=====================
-
-Quiescence
-----------
-
-After the ndo_stop routine has been called, the hardware must
-not receive or transmit any data.  All in flight packets must
-be aborted. If necessary, poll or wait for completion of
-any reset commands.
-
-Auto-close
-----------
-
-The ndo_stop routine will be called by unregister_netdevice
-if device is still UP.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 3/7] docs: net: use C syntax highlight in driver.rst
  2023-04-07  1:25 [PATCH net-next v4 0/7] net: lockless stop/wake combo macros Jakub Kicinski
  2023-04-07  1:25 ` [PATCH net-next v4 1/7] docs: net: reformat driver.rst from a list to sections Jakub Kicinski
  2023-04-07  1:25 ` [PATCH net-next v4 2/7] docs: net: move the probe and open/close sections of driver.rst up Jakub Kicinski
@ 2023-04-07  1:25 ` Jakub Kicinski
  2023-04-07  1:25 ` [PATCH net-next v4 4/7] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2023-04-07  1:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

Use syntax highlight, comment out the "..." since they are
not valid C.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/driver.rst | 30 +++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index bfbd66871bb3..19c363291d04 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -43,7 +43,9 @@ there is no way your device can tell ahead of time when its
 transmit function will become busy.
 
 Instead it must maintain the queue properly.  For example,
-for a driver implementing scatter-gather this means::
+for a driver implementing scatter-gather this means:
+
+.. code-block:: c
 
 	static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
 					       struct net_device *dev)
@@ -51,7 +53,7 @@ Instead it must maintain the queue properly.  For example,
 		struct drv *dp = netdev_priv(dev);
 
 		lock_tx(dp);
-		...
+		//...
 		/* This is a hard error log it. */
 		if (TX_BUFFS_AVAIL(dp) <= (skb_shinfo(skb)->nr_frags + 1)) {
 			netif_stop_queue(dev);
@@ -61,34 +63,42 @@ Instead it must maintain the queue properly.  For example,
 			return NETDEV_TX_BUSY;
 		}
 
-		... queue packet to card ...
-		... update tx consumer index ...
+		//... queue packet to card ...
+		//... update tx consumer index ...
 
 		if (TX_BUFFS_AVAIL(dp) <= (MAX_SKB_FRAGS + 1))
 			netif_stop_queue(dev);
 
-		...
+		//...
 		unlock_tx(dp);
-		...
+		//...
 		return NETDEV_TX_OK;
 	}
 
-And then at the end of your TX reclamation event handling::
+And then at the end of your TX reclamation event handling:
+
+.. code-block:: c
 
 	if (netif_queue_stopped(dp->dev) &&
 	    TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
 		netif_wake_queue(dp->dev);
 
-For a non-scatter-gather supporting card, the three tests simply become::
+For a non-scatter-gather supporting card, the three tests simply become:
+
+.. code-block:: c
 
 		/* This is a hard error log it. */
 		if (TX_BUFFS_AVAIL(dp) <= 0)
 
-and::
+and:
+
+.. code-block:: c
 
 		if (TX_BUFFS_AVAIL(dp) == 0)
 
-and::
+and:
+
+.. code-block:: c
 
 	if (netif_queue_stopped(dp->dev) &&
 	    TX_BUFFS_AVAIL(dp) > 0)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 4/7] net: provide macros for commonly copied lockless queue stop/wake code
  2023-04-07  1:25 [PATCH net-next v4 0/7] net: lockless stop/wake combo macros Jakub Kicinski
                   ` (2 preceding siblings ...)
  2023-04-07  1:25 ` [PATCH net-next v4 3/7] docs: net: use C syntax highlight in driver.rst Jakub Kicinski
@ 2023-04-07  1:25 ` Jakub Kicinski
  2023-04-07  1:25 ` [PATCH net-next v4 5/7] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2023-04-07  1:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

A lot of drivers follow the same scheme to stop / start queues
without introducing locks between xmit and NAPI tx completions.
I'm guessing they all copy'n'paste each other's code.
The original code dates back all the way to e1000 and Linux 2.6.19.

Smaller drivers shy away from the scheme and introduce a lock
which may cause deadlocks in netpoll.

Provide macros which encapsulate the necessary logic.

The macros do not prevent false wake ups, the extra barrier
required to close that race is not worth it. See discussion in:
https://lore.kernel.org/all/c39312a2-4537-14b4-270c-9fe1fbb91e89@gmail.com/

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v4:
 - stopped -> unchanged in kdoc
v3: https://lore.kernel.org/all/20230405223134.94665-1-kuba@kernel.org/
 - use smp_mb__after_atomic()
 - improve the comments on barriers
 - document the possibility of false wakes
 - rename netif_tx_queue_* -> netif_txq_*
v2: https://lore.kernel.org/all/20230401051221.3160913-2-kuba@kernel.org/
 - really flip the unlikely into a likely in __netif_tx_queue_maybe_wake()
 - convert if / else into pre-init of _ret
v1: https://lore.kernel.org/all/20230322233028.269410-1-kuba@kernel.org/
 - perdicate -> predicate
 - on race use start instead of wake and make a note of that
   in the doc / comment at the start
rfc: https://lore.kernel.org/all/20230311050130.115138-1-kuba@kernel.org/
---
 Documentation/networking/driver.rst |   6 ++
 include/linux/netdevice.h           |   1 +
 include/net/netdev_queues.h         | 144 ++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+)
 create mode 100644 include/net/netdev_queues.h

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index 19c363291d04..4071f2c00f8b 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -104,6 +104,12 @@ Instead it must maintain the queue properly.  For example,
 	    TX_BUFFS_AVAIL(dp) > 0)
 		netif_wake_queue(dp->dev);
 
+Lockless queue stop / wake helper macros
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: include/net/netdev_queues.h
+   :doc: Lockless queue stopping / waking helpers.
+
 No exclusive ownership
 ----------------------
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a740be3bb911..18770d325499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3341,6 +3341,7 @@ static inline void netif_tx_wake_all_queues(struct net_device *dev)
 
 static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
 {
+	/* Must be an atomic op see netif_txq_try_stop() */
 	set_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
 }
 
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
new file mode 100644
index 000000000000..5236d78bbdeb
--- /dev/null
+++ b/include/net/netdev_queues.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_NET_QUEUES_H
+#define _LINUX_NET_QUEUES_H
+
+#include <linux/netdevice.h>
+
+/**
+ * DOC: Lockless queue stopping / waking helpers.
+ *
+ * The netif_txq_maybe_stop() and __netif_txq_completed_wake()
+ * macros are designed to safely implement stopping
+ * and waking netdev queues without full lock protection.
+ *
+ * We assume that there can be no concurrent stop attempts and no concurrent
+ * wake attempts. The try-stop should happen from the xmit handler,
+ * while wake up should be triggered from NAPI poll context.
+ * The two may run concurrently (single producer, single consumer).
+ *
+ * The try-stop side is expected to run from the xmit handler and therefore
+ * it does not reschedule Tx (netif_tx_start_queue() instead of
+ * netif_tx_wake_queue()). Uses of the ``stop`` macros outside of the xmit
+ * handler may lead to xmit queue being enabled but not run.
+ * The waking side does not have similar context restrictions.
+ *
+ * The macros guarantee that rings will not remain stopped if there's
+ * space available, but they do *not* prevent false wake ups when
+ * the ring is full! Drivers should check for ring full at the start
+ * for the xmit handler.
+ *
+ * All descriptor ring indexes (and other relevant shared state) must
+ * be updated before invoking the macros.
+ */
+
+#define netif_txq_try_stop(txq, get_desc, start_thrs)			\
+	({								\
+		int _res;						\
+									\
+		netif_tx_stop_queue(txq);				\
+		/* Producer index and stop bit must be visible		\
+		 * to consumer before we recheck.			\
+		 * Pairs with a barrier in __netif_txq_maybe_wake().	\
+		 */							\
+		smp_mb__after_atomic();					\
+									\
+		/* We need to check again in a case another		\
+		 * CPU has just made room available.			\
+		 */							\
+		_res = 0;						\
+		if (unlikely(get_desc >= start_thrs)) {			\
+			netif_tx_start_queue(txq);			\
+			_res = -1;					\
+		}							\
+		_res;							\
+	})								\
+
+/**
+ * netif_txq_maybe_stop() - locklessly stop a Tx queue, if needed
+ * @txq:	struct netdev_queue to stop/start
+ * @get_desc:	get current number of free descriptors (see requirements below!)
+ * @stop_thrs:	minimal number of available descriptors for queue to be left
+ *		enabled
+ * @start_thrs:	minimal number of descriptors to re-enable the queue, can be
+ *		equal to @stop_thrs or higher to avoid frequent waking
+ *
+ * All arguments may be evaluated multiple times, beware of side effects.
+ * @get_desc must be a formula or a function call, it must always
+ * return up-to-date information when evaluated!
+ * Expected to be used from ndo_start_xmit, see the comment on top of the file.
+ *
+ * Returns:
+ *	 0 if the queue was stopped
+ *	 1 if the queue was left enabled
+ *	-1 if the queue was re-enabled (raced with waking)
+ */
+#define netif_txq_maybe_stop(txq, get_desc, stop_thrs, start_thrs)	\
+	({								\
+		int _res;						\
+									\
+		_res = 1;						\
+		if (unlikely(get_desc < stop_thrs))			\
+			_res = netif_txq_try_stop(txq, get_desc, start_thrs); \
+		_res;							\
+	})								\
+
+
+/**
+ * __netif_txq_maybe_wake() - locklessly wake a Tx queue, if needed
+ * @txq:	struct netdev_queue to stop/start
+ * @get_desc:	get current number of free descriptors (see requirements below!)
+ * @start_thrs:	minimal number of descriptors to re-enable the queue
+ * @down_cond:	down condition, predicate indicating that the queue should
+ *		not be woken up even if descriptors are available
+ *
+ * All arguments may be evaluated multiple times.
+ * @get_desc must be a formula or a function call, it must always
+ * return up-to-date information when evaluated!
+ *
+ * Returns:
+ *	 0 if the queue was woken up
+ *	 1 if the queue was already enabled (or disabled but @down_cond is true)
+ *	-1 if the queue was left unchanged (@start_thrs not reached)
+ */
+#define __netif_txq_maybe_wake(txq, get_desc, start_thrs, down_cond)	\
+	({								\
+		int _res;						\
+									\
+		_res = -1;						\
+		if (likely(get_desc > start_thrs)) {			\
+			/* Make sure that anybody stopping the queue after \
+			 * this sees the new next_to_clean.		\
+			 */						\
+			smp_mb();					\
+			_res = 1;					\
+			if (unlikely(netif_tx_queue_stopped(txq)) &&	\
+			    !(down_cond)) {				\
+				netif_tx_wake_queue(txq);		\
+				_res = 0;				\
+			}						\
+		}							\
+		_res;							\
+	})
+
+#define netif_txq_maybe_wake(txq, get_desc, start_thrs)		\
+	__netif_txq_maybe_wake(txq, get_desc, start_thrs, false)
+
+/* subqueue variants follow */
+
+#define netif_subqueue_try_stop(dev, idx, get_desc, start_thrs)		\
+	({								\
+		struct netdev_queue *txq;				\
+									\
+		txq = netdev_get_tx_queue(dev, idx);			\
+		netif_txq_try_stop(txq, get_desc, start_thrs);		\
+	})
+
+#define netif_subqueue_maybe_stop(dev, idx, get_desc, stop_thrs, start_thrs) \
+	({								\
+		struct netdev_queue *txq;				\
+									\
+		txq = netdev_get_tx_queue(dev, idx);			\
+		netif_txq_maybe_stop(txq, get_desc, stop_thrs, start_thrs); \
+	})
+
+#endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 5/7] ixgbe: use new queue try_stop/try_wake macros
  2023-04-07  1:25 [PATCH net-next v4 0/7] net: lockless stop/wake combo macros Jakub Kicinski
                   ` (3 preceding siblings ...)
  2023-04-07  1:25 ` [PATCH net-next v4 4/7] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
@ 2023-04-07  1:25 ` Jakub Kicinski
  2023-04-07  1:25 ` [PATCH net-next v4 6/7] bnxt: " Jakub Kicinski
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2023-04-07  1:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski, jesse.brandeburg,
	anthony.l.nguyen

Convert ixgbe to use the new macros, I think a lot of people
copy the ixgbe code. The only functional change is that the
unlikely() in ixgbe_clean_tx_irq() turns into a likely()
inside the new macro and no longer includes

  total_packets && netif_carrier_ok(tx_ring->netdev)

which is probably for the best, anyway.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v3:
 - call netdev_get_tx_queue() locally, avoid the need for
   another layer of macros in the core

CC: jesse.brandeburg@intel.com
CC: anthony.l.nguyen@intel.com
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 38 +++++--------------
 1 file changed, 10 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 773c35fecace..cbbddee55db1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -36,6 +36,7 @@
 #include <net/tc_act/tc_mirred.h>
 #include <net/vxlan.h>
 #include <net/mpls.h>
+#include <net/netdev_queues.h>
 #include <net/xdp_sock_drv.h>
 #include <net/xfrm.h>
 
@@ -1119,6 +1120,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	unsigned int total_bytes = 0, total_packets = 0, total_ipsec = 0;
 	unsigned int budget = q_vector->tx.work_limit;
 	unsigned int i = tx_ring->next_to_clean;
+	struct netdev_queue *txq;
 
 	if (test_bit(__IXGBE_DOWN, &adapter->state))
 		return true;
@@ -1253,20 +1255,12 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 				  total_packets, total_bytes);
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
-		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
-		/* Make sure that anybody stopping the queue after this
-		 * sees the new next_to_clean.
-		 */
-		smp_mb();
-		if (__netif_subqueue_stopped(tx_ring->netdev,
-					     tx_ring->queue_index)
-		    && !test_bit(__IXGBE_DOWN, &adapter->state)) {
-			netif_wake_subqueue(tx_ring->netdev,
-					    tx_ring->queue_index);
-			++tx_ring->tx_stats.restart_queue;
-		}
-	}
+	txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index);
+	if (total_packets && netif_carrier_ok(tx_ring->netdev) &&
+	    !__netif_txq_maybe_wake(txq, ixgbe_desc_unused(tx_ring),
+				    TX_WAKE_THRESHOLD,
+				    test_bit(__IXGBE_DOWN, &adapter->state)))
+		++tx_ring->tx_stats.restart_queue;
 
 	return !!budget;
 }
@@ -8270,22 +8264,10 @@ static void ixgbe_tx_olinfo_status(union ixgbe_adv_tx_desc *tx_desc,
 
 static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 {
-	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
-
-	/* Herbert's original patch had:
-	 *  smp_mb__after_netif_stop_queue();
-	 * but since that doesn't exist yet, just open code it.
-	 */
-	smp_mb();
-
-	/* We need to check again in a case another CPU has just
-	 * made room available.
-	 */
-	if (likely(ixgbe_desc_unused(tx_ring) < size))
+	if (!netif_subqueue_try_stop(tx_ring->netdev, tx_ring->queue_index,
+				     ixgbe_desc_unused(tx_ring), size))
 		return -EBUSY;
 
-	/* A reprieve! - use start_queue because it doesn't call schedule */
-	netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
 	++tx_ring->tx_stats.restart_queue;
 	return 0;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 6/7] bnxt: use new queue try_stop/try_wake macros
  2023-04-07  1:25 [PATCH net-next v4 0/7] net: lockless stop/wake combo macros Jakub Kicinski
                   ` (4 preceding siblings ...)
  2023-04-07  1:25 ` [PATCH net-next v4 5/7] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
@ 2023-04-07  1:25 ` Jakub Kicinski
  2023-04-07  9:25   ` Eric Dumazet
  2023-04-07  1:25 ` [PATCH net-next v4 7/7] net: piggy back on the memory barrier in bql when waking queues Jakub Kicinski
  2023-04-11  1:30 ` [PATCH net-next v4 0/7] net: lockless stop/wake combo macros patchwork-bot+netdevbpf
  7 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2023-04-07  1:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski, Michael Chan

Convert bnxt to use new macros rather than open code the logic.
Two differences:
(1) bnxt_tx_int() will now only issue a memory barrier if it sees
    enough space on the ring to wake the queue. This should be fine,
    the mb() is between the writes to the ring pointers and checking
    queue state.
(2) we'll start the queue instead of waking on race, this should
    be safe inside the xmit handler.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v3:
 - no change
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 40 ++++-------------------
 1 file changed, 7 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8ff5a4f98d6f..de97bee25249 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -56,6 +56,7 @@
 #include <linux/hwmon-sysfs.h>
 #include <net/page_pool.h>
 #include <linux/align.h>
+#include <net/netdev_queues.h>
 
 #include "bnxt_hsi.h"
 #include "bnxt.h"
@@ -331,26 +332,6 @@ static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
 	txr->kick_pending = 0;
 }
 
-static bool bnxt_txr_netif_try_stop_queue(struct bnxt *bp,
-					  struct bnxt_tx_ring_info *txr,
-					  struct netdev_queue *txq)
-{
-	netif_tx_stop_queue(txq);
-
-	/* netif_tx_stop_queue() must be done before checking
-	 * tx index in bnxt_tx_avail() below, because in
-	 * bnxt_tx_int(), we update tx index before checking for
-	 * netif_tx_queue_stopped().
-	 */
-	smp_mb();
-	if (bnxt_tx_avail(bp, txr) >= bp->tx_wake_thresh) {
-		netif_tx_wake_queue(txq);
-		return false;
-	}
-
-	return true;
-}
-
 static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct bnxt *bp = netdev_priv(dev);
@@ -384,7 +365,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (net_ratelimit() && txr->kick_pending)
 			netif_warn(bp, tx_err, dev,
 				   "bnxt: ring busy w/ flush pending!\n");
-		if (bnxt_txr_netif_try_stop_queue(bp, txr, txq))
+		if (!netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+					bp->tx_wake_thresh))
 			return NETDEV_TX_BUSY;
 	}
 
@@ -614,7 +596,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (netdev_xmit_more() && !tx_buf->is_push)
 			bnxt_txr_db_kick(bp, txr, prod);
 
-		bnxt_txr_netif_try_stop_queue(bp, txr, txq);
+		netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+				   bp->tx_wake_thresh);
 	}
 	return NETDEV_TX_OK;
 
@@ -708,17 +691,8 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 	netdev_tx_completed_queue(txq, nr_pkts, tx_bytes);
 	txr->tx_cons = cons;
 
-	/* Need to make the tx_cons update visible to bnxt_start_xmit()
-	 * before checking for netif_tx_queue_stopped().  Without the
-	 * memory barrier, there is a small possibility that bnxt_start_xmit()
-	 * will miss it and cause the queue to be stopped forever.
-	 */
-	smp_mb();
-
-	if (unlikely(netif_tx_queue_stopped(txq)) &&
-	    bnxt_tx_avail(bp, txr) >= bp->tx_wake_thresh &&
-	    READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING)
-		netif_tx_wake_queue(txq);
+	__netif_txq_maybe_wake(txq, bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
+			       READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
 }
 
 static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 7/7] net: piggy back on the memory barrier in bql when waking queues
  2023-04-07  1:25 [PATCH net-next v4 0/7] net: lockless stop/wake combo macros Jakub Kicinski
                   ` (5 preceding siblings ...)
  2023-04-07  1:25 ` [PATCH net-next v4 6/7] bnxt: " Jakub Kicinski
@ 2023-04-07  1:25 ` Jakub Kicinski
  2023-04-11  1:30 ` [PATCH net-next v4 0/7] net: lockless stop/wake combo macros patchwork-bot+netdevbpf
  7 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2023-04-07  1:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski, michael.chan, jesse.brandeburg,
	anthony.l.nguyen

Drivers call netdev_tx_completed_queue() right before
netif_txq_maybe_wake(). If BQL is enabled netdev_tx_completed_queue()
should issue a memory barrier, so we can depend on that separating
the stop check from the consumer index update, instead of adding
another barrier in netif_txq_maybe_wake().

This matters more than the barriers on the xmit path, because
the wake condition is almost always true. So we issue the
consumer side barrier often.

Wrap netdev_tx_completed_queue() in a local helper to issue
the barrier even if BQL is disabled. Keep the same semantics
as netdev_tx_completed_queue() (barrier only if bytes != 0)
to make it clear that the barrier is conditional.

Plus since macro gets pkt/byte counts as arguments now -
we can skip waking if there were no packets completed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v4:
 - use IS_ENABLED()
 - improve commit message (add the 3rd paragraph)
v3: https://lore.kernel.org/all/20230405223134.94665-8-kuba@kernel.org/
 - new patch

CC: michael.chan@broadcom.com
CC: jesse.brandeburg@intel.com
CC: anthony.l.nguyen@intel.com
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  6 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +++---
 include/linux/netdevice.h                     |  2 +-
 include/net/netdev_queues.h                   | 39 ++++++++++++++-----
 4 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index de97bee25249..f7602d8d79e3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -688,11 +688,11 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 		dev_kfree_skb_any(skb);
 	}
 
-	netdev_tx_completed_queue(txq, nr_pkts, tx_bytes);
 	txr->tx_cons = cons;
 
-	__netif_txq_maybe_wake(txq, bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
-			       READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
+	__netif_txq_completed_wake(txq, nr_pkts, tx_bytes,
+				   bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
+				   READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
 }
 
 static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cbbddee55db1..f2604fc05991 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1251,15 +1251,13 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	if (ring_is_xdp(tx_ring))
 		return !!budget;
 
-	netdev_tx_completed_queue(txring_txq(tx_ring),
-				  total_packets, total_bytes);
-
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
 	txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index);
-	if (total_packets && netif_carrier_ok(tx_ring->netdev) &&
-	    !__netif_txq_maybe_wake(txq, ixgbe_desc_unused(tx_ring),
-				    TX_WAKE_THRESHOLD,
-				    test_bit(__IXGBE_DOWN, &adapter->state)))
+	if (!__netif_txq_completed_wake(txq, total_packets, total_bytes,
+					ixgbe_desc_unused(tx_ring),
+					TX_WAKE_THRESHOLD,
+					netif_carrier_ok(tx_ring->netdev) &&
+					test_bit(__IXGBE_DOWN, &adapter->state)))
 		++tx_ring->tx_stats.restart_queue;
 
 	return !!budget;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 18770d325499..10736fc6d85e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3538,7 +3538,7 @@ static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue,
 	 * netdev_tx_sent_queue will miss the update and cause the queue to
 	 * be stopped forever
 	 */
-	smp_mb();
+	smp_mb(); /* NOTE: netdev_txq_completed_mb() assumes this exists */
 
 	if (unlikely(dql_avail(&dev_queue->dql) < 0))
 		return;
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 5236d78bbdeb..b26fdb441e39 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -38,7 +38,7 @@
 		netif_tx_stop_queue(txq);				\
 		/* Producer index and stop bit must be visible		\
 		 * to consumer before we recheck.			\
-		 * Pairs with a barrier in __netif_txq_maybe_wake().	\
+		 * Pairs with a barrier in __netif_txq_completed_wake(). \
 		 */							\
 		smp_mb__after_atomic();					\
 									\
@@ -82,10 +82,24 @@
 		_res;							\
 	})								\
 
+/* Variant of netdev_tx_completed_queue() which guarantees smp_mb() if
+ * @bytes != 0, regardless of kernel config.
+ */
+static inline void
+netdev_txq_completed_mb(struct netdev_queue *dev_queue,
+			unsigned int pkts, unsigned int bytes)
+{
+	if (IS_ENABLED(CONFIG_BQL))
+		netdev_tx_completed_queue(dev_queue, pkts, bytes);
+	else if (bytes)
+		smp_mb();
+}
 
 /**
- * __netif_txq_maybe_wake() - locklessly wake a Tx queue, if needed
+ * __netif_txq_completed_wake() - locklessly wake a Tx queue, if needed
  * @txq:	struct netdev_queue to stop/start
+ * @pkts:	number of packets completed
+ * @bytes:	number of bytes completed
  * @get_desc:	get current number of free descriptors (see requirements below!)
  * @start_thrs:	minimal number of descriptors to re-enable the queue
  * @down_cond:	down condition, predicate indicating that the queue should
@@ -94,22 +108,27 @@
  * All arguments may be evaluated multiple times.
  * @get_desc must be a formula or a function call, it must always
  * return up-to-date information when evaluated!
+ * Reports completed pkts/bytes to BQL.
  *
  * Returns:
  *	 0 if the queue was woken up
  *	 1 if the queue was already enabled (or disabled but @down_cond is true)
  *	-1 if the queue was left unchanged (@start_thrs not reached)
  */
-#define __netif_txq_maybe_wake(txq, get_desc, start_thrs, down_cond)	\
+#define __netif_txq_completed_wake(txq, pkts, bytes,			\
+				   get_desc, start_thrs, down_cond)	\
 	({								\
 		int _res;						\
 									\
+		/* Report to BQL and piggy back on its barrier.		\
+		 * Barrier makes sure that anybody stopping the queue	\
+		 * after this point sees the new consumer index.	\
+		 * Pairs with barrier in netif_txq_try_stop().		\
+		 */							\
+		netdev_txq_completed_mb(txq, pkts, bytes);		\
+									\
 		_res = -1;						\
-		if (likely(get_desc > start_thrs)) {			\
-			/* Make sure that anybody stopping the queue after \
-			 * this sees the new next_to_clean.		\
-			 */						\
-			smp_mb();					\
+		if (pkts && likely(get_desc > start_thrs)) {		\
 			_res = 1;					\
 			if (unlikely(netif_tx_queue_stopped(txq)) &&	\
 			    !(down_cond)) {				\
@@ -120,8 +139,8 @@
 		_res;							\
 	})
 
-#define netif_txq_maybe_wake(txq, get_desc, start_thrs)		\
-	__netif_txq_maybe_wake(txq, get_desc, start_thrs, false)
+#define netif_txq_completed_wake(txq, pkts, bytes, get_desc, start_thrs) \
+	__netif_txq_completed_wake(txq, pkts, bytes, get_desc, start_thrs, false)
 
 /* subqueue variants follow */
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 6/7] bnxt: use new queue try_stop/try_wake macros
  2023-04-07  1:25 ` [PATCH net-next v4 6/7] bnxt: " Jakub Kicinski
@ 2023-04-07  9:25   ` Eric Dumazet
  2023-04-07 14:36     ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2023-04-07  9:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Michael Chan

On Fri, Apr 7, 2023 at 3:25 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Convert bnxt to use new macros rather than open code the logic.
> Two differences:
> (1) bnxt_tx_int() will now only issue a memory barrier if it sees
>     enough space on the ring to wake the queue. This should be fine,
>     the mb() is between the writes to the ring pointers and checking
>     queue state.
> (2) we'll start the queue instead of waking on race, this should
>     be safe inside the xmit handler.

...

>                                    "bnxt: ring busy w/ flush pending!\n");
> -               if (bnxt_txr_netif_try_stop_queue(bp, txr, txq))
> +               if (!netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
> +                                       bp->tx_wake_thresh))
>                         return NETDEV_TX_BUSY;
>         }
>

BTW, we should make sure the @desc argument of netif_txq_try_stop() is
correctly implemented.
I often see drivers with either buggy or not efficient implementation.

For instance, bnxt could perhaps be slightly described more accurately with:

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 18cac98ba58e86c53eb87ed592424ed719b2f892..ec4a2d4dea3c58de08e51c1d65b3d2fd75d71e6a
100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2231,13 +2231,12 @@ struct bnxt {
 #define SFF_MODULE_ID_QSFP28                   0x11
 #define BNXT_MAX_PHY_I2C_RESP_SIZE             64

-static inline u32 bnxt_tx_avail(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
+static inline u32 bnxt_tx_avail(const struct bnxt *bp,
+                               const struct bnxt_tx_ring_info *txr)
 {
-       /* Tell compiler to fetch tx indices from memory. */
-       barrier();
+       u32 used = READ_ONCE(txr->tx_prod) - READ_ONCE(txr->tx_cons);

-       return bp->tx_ring_size -
-               ((txr->tx_prod - txr->tx_cons) & bp->tx_ring_mask);
+       return bp->tx_ring_size - (used & bp->tx_ring_mask);
 }

 static inline void bnxt_writeq(struct bnxt *bp, u64 val,


In the same vein, mlx4 would probably need something like:

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 2f79378fbf6ec106bf78d0fd12e911ff200ba8d7..13557701e254e7aeff057abd26a88d19c3a5cd69
100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -226,9 +226,11 @@ void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
                       MLX4_QP_STATE_RST, NULL, 0, 0, &ring->sp_qp);
 }

-static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
+static inline bool mlx4_en_is_tx_ring_full(const struct mlx4_en_tx_ring *ring)
 {
-       return ring->prod - ring->cons > ring->full_size;
+       u32 used = READ_ONCE(ring->prod) - READ_ONCE(ring->cons);
+
+       return used > ring->full_size;
 }

 static void mlx4_en_stamp_wqe(struct mlx4_en_priv *priv,

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 6/7] bnxt: use new queue try_stop/try_wake macros
  2023-04-07  9:25   ` Eric Dumazet
@ 2023-04-07 14:36     ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2023-04-07 14:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Michael Chan

On Fri, 7 Apr 2023 11:25:23 +0200 Eric Dumazet wrote:
> BTW, we should make sure the @desc argument of netif_txq_try_stop() is
> correctly implemented.
> I often see drivers with either buggy or not efficient implementation.

What do you have in mind?

Should we update the sample code in driver.rst further?
There is outdated sample code there:
https://www.kernel.org/doc/html/next/networking/driver.html
but IDK if people read the documentation :(

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 0/7] net: lockless stop/wake combo macros
  2023-04-07  1:25 [PATCH net-next v4 0/7] net: lockless stop/wake combo macros Jakub Kicinski
                   ` (6 preceding siblings ...)
  2023-04-07  1:25 ` [PATCH net-next v4 7/7] net: piggy back on the memory barrier in bql when waking queues Jakub Kicinski
@ 2023-04-11  1:30 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-04-11  1:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, herbert, alexander.duyck,
	hkallweit1, andrew, willemb

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu,  6 Apr 2023 18:25:29 -0700 you wrote:
> A lot of drivers follow the same scheme to stop / start queues
> without introducing locks between xmit and NAPI tx completions.
> I'm guessing they all copy'n'paste each other's code.
> The original code dates back all the way to e1000 and Linux 2.6.19.
> 
> v4:
>  - adjust a comment in patch 4
>  - use IS_ENABLED() in patch 7
> v3: https://lore.kernel.org/all/20230405223134.94665-1-kuba@kernel.org/
>  - render the info as part of documentation, maybe someone will
>    notice and use it (patches 1, 2, 3 are new)
>  - use the __after_atomic barrier
>  - add last patch to avoid a barrier in the wake path
> more detailed change logs in the patches.
> 
> [...]

Here is the summary with links:
  - [net-next,v4,1/7] docs: net: reformat driver.rst from a list to sections
    https://git.kernel.org/netdev/net-next/c/d2f5c68e3f71
  - [net-next,v4,2/7] docs: net: move the probe and open/close sections of driver.rst up
    https://git.kernel.org/netdev/net-next/c/da4f0f82ee9d
  - [net-next,v4,3/7] docs: net: use C syntax highlight in driver.rst
    https://git.kernel.org/netdev/net-next/c/8336462539ae
  - [net-next,v4,4/7] net: provide macros for commonly copied lockless queue stop/wake code
    https://git.kernel.org/netdev/net-next/c/c91c46de6bbc
  - [net-next,v4,5/7] ixgbe: use new queue try_stop/try_wake macros
    https://git.kernel.org/netdev/net-next/c/9ded5bc77fe5
  - [net-next,v4,6/7] bnxt: use new queue try_stop/try_wake macros
    https://git.kernel.org/netdev/net-next/c/08a096780d92
  - [net-next,v4,7/7] net: piggy back on the memory barrier in bql when waking queues
    https://git.kernel.org/netdev/net-next/c/301f227fc860

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-04-11  1:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-07  1:25 [PATCH net-next v4 0/7] net: lockless stop/wake combo macros Jakub Kicinski
2023-04-07  1:25 ` [PATCH net-next v4 1/7] docs: net: reformat driver.rst from a list to sections Jakub Kicinski
2023-04-07  1:25 ` [PATCH net-next v4 2/7] docs: net: move the probe and open/close sections of driver.rst up Jakub Kicinski
2023-04-07  1:25 ` [PATCH net-next v4 3/7] docs: net: use C syntax highlight in driver.rst Jakub Kicinski
2023-04-07  1:25 ` [PATCH net-next v4 4/7] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-04-07  1:25 ` [PATCH net-next v4 5/7] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
2023-04-07  1:25 ` [PATCH net-next v4 6/7] bnxt: " Jakub Kicinski
2023-04-07  9:25   ` Eric Dumazet
2023-04-07 14:36     ` Jakub Kicinski
2023-04-07  1:25 ` [PATCH net-next v4 7/7] net: piggy back on the memory barrier in bql when waking queues Jakub Kicinski
2023-04-11  1:30 ` [PATCH net-next v4 0/7] net: lockless stop/wake combo macros patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).