All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/7] net: lockless stop/wake combo macros
@ 2023-04-05 22:31 Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 1/7] docs: net: reformat driver.rst from a list to sections Jakub Kicinski
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-05 22:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

A lot of drivers follow the same scheme to stop / start queues
without introducing locks between xmit and NAPI tx completions.
I'm guessing they all copy'n'paste each other's code.
The original code dates back all the way to e1000 and Linux 2.6.19.

v3:
 - render the info as part of documentation, maybe someone will
   notice and use it (patches 1, 2, 3 are new)
 - use the __after_atomic barrier
 - add last patch to avoid a barrier in the wake path
more detailed change logs in the patches.

v2: https://lore.kernel.org/all/20230401051221.3160913-2-kuba@kernel.org/
 - really flip the unlikely into a likely in __netif_tx_queue_maybe_wake()
 - convert if / else into pre-init of _ret
v1: https://lore.kernel.org/all/20230322233028.269410-1-kuba@kernel.org/
 - perdicate -> predicate
 - on race use start instead of wake and make a note of that
   in the doc / comment at the start
rfc: https://lore.kernel.org/all/20230311050130.115138-1-kuba@kernel.org/

Jakub Kicinski (7):
  docs: net: reformat driver.rst from a list to sections
  docs: net: move the probe and open/close sections of driver.rst up
  docs: net: use C syntax highlight in driver.rst
  net: provide macros for commonly copied lockless queue stop/wake code
  ixgbe: use new queue try_stop/try_wake macros
  bnxt: use new queue try_stop/try_wake macros
  net: piggy back on the memory barrier in bql when waking queues

 Documentation/networking/driver.rst           | 119 ++++++++-----
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  42 +----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  42 ++---
 include/linux/netdevice.h                     |   3 +-
 include/net/netdev_queues.h                   | 165 ++++++++++++++++++
 5 files changed, 264 insertions(+), 107 deletions(-)
 create mode 100644 include/net/netdev_queues.h

-- 
2.39.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 1/7] docs: net: reformat driver.rst from a list to sections
  2023-04-05 22:31 [PATCH net-next v3 0/7] net: lockless stop/wake combo macros Jakub Kicinski
@ 2023-04-05 22:31 ` Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 2/7] docs: net: move the probe and open/close sections of driver.rst up Jakub Kicinski
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-05 22:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

driver.rst had a historical form of list of common problems.
In the age os Sphinx and rendered documentation it's better
to use the more usual title + text format.

This will allow us to render kdoc into the output more naturally.

No changes to the actual text.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/driver.rst | 91 ++++++++++++++++++-----------
 1 file changed, 56 insertions(+), 35 deletions(-)

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index 64f7236ff10b..3040a74d421c 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -4,15 +4,19 @@
 Softnet Driver Issues
 =====================
 
-Transmit path guidelines:
+Transmit path guidelines
+========================
 
-1) The ndo_start_xmit method must not return NETDEV_TX_BUSY under
-   any normal circumstances.  It is considered a hard error unless
-   there is no way your device can tell ahead of time when its
-   transmit function will become busy.
+Stop queues in advance
+----------------------
 
-   Instead it must maintain the queue properly.  For example,
-   for a driver implementing scatter-gather this means::
+The ndo_start_xmit method must not return NETDEV_TX_BUSY under
+any normal circumstances.  It is considered a hard error unless
+there is no way your device can tell ahead of time when its
+transmit function will become busy.
+
+Instead it must maintain the queue properly.  For example,
+for a driver implementing scatter-gather this means::
 
 	static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
 					       struct net_device *dev)
@@ -42,56 +46,73 @@ Softnet Driver Issues
 		return NETDEV_TX_OK;
 	}
 
-   And then at the end of your TX reclamation event handling::
+And then at the end of your TX reclamation event handling::
 
 	if (netif_queue_stopped(dp->dev) &&
 	    TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
 		netif_wake_queue(dp->dev);
 
-   For a non-scatter-gather supporting card, the three tests simply become::
+For a non-scatter-gather supporting card, the three tests simply become::
 
 		/* This is a hard error log it. */
 		if (TX_BUFFS_AVAIL(dp) <= 0)
 
-   and::
+and::
 
 		if (TX_BUFFS_AVAIL(dp) == 0)
 
-   and::
+and::
 
 	if (netif_queue_stopped(dp->dev) &&
 	    TX_BUFFS_AVAIL(dp) > 0)
 		netif_wake_queue(dp->dev);
 
-2) An ndo_start_xmit method must not modify the shared parts of a
-   cloned SKB.
+No exclusive ownership
+----------------------
+
+An ndo_start_xmit method must not modify the shared parts of a
+cloned SKB.
+
+Timely completions
+------------------
+
+Do not forget that once you return NETDEV_TX_OK from your
+ndo_start_xmit method, it is your driver's responsibility to free
+up the SKB and in some finite amount of time.
 
-3) Do not forget that once you return NETDEV_TX_OK from your
-   ndo_start_xmit method, it is your driver's responsibility to free
-   up the SKB and in some finite amount of time.
+For example, this means that it is not allowed for your TX
+mitigation scheme to let TX packets "hang out" in the TX
+ring unreclaimed forever if no new TX packets are sent.
+This error can deadlock sockets waiting for send buffer room
+to be freed up.
 
-   For example, this means that it is not allowed for your TX
-   mitigation scheme to let TX packets "hang out" in the TX
-   ring unreclaimed forever if no new TX packets are sent.
-   This error can deadlock sockets waiting for send buffer room
-   to be freed up.
+If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
+must not keep any reference to that SKB and you must not attempt
+to free it up.
 
-   If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
-   must not keep any reference to that SKB and you must not attempt
-   to free it up.
+Probing guidelines
+==================
 
-Probing guidelines:
+Address validation
+------------------
+
+Any hardware layer address you obtain for your device should
+be verified.  For example, for ethernet check it with
+linux/etherdevice.h:is_valid_ether_addr()
+
+Close/stop guidelines
+=====================
 
-1) Any hardware layer address you obtain for your device should
-   be verified.  For example, for ethernet check it with
-   linux/etherdevice.h:is_valid_ether_addr()
+Quiescence
+----------
 
-Close/stop guidelines:
+After the ndo_stop routine has been called, the hardware must
+not receive or transmit any data.  All in flight packets must
+be aborted. If necessary, poll or wait for completion of
+any reset commands.
 
-1) After the ndo_stop routine has been called, the hardware must
-   not receive or transmit any data.  All in flight packets must
-   be aborted. If necessary, poll or wait for completion of
-   any reset commands.
+Auto-close
+----------
 
-2) The ndo_stop routine will be called by unregister_netdevice
-   if device is still UP.
+The ndo_stop routine will be called by unregister_netdevice
+if device is still UP.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 2/7] docs: net: move the probe and open/close sections of driver.rst up
  2023-04-05 22:31 [PATCH net-next v3 0/7] net: lockless stop/wake combo macros Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 1/7] docs: net: reformat driver.rst from a list to sections Jakub Kicinski
@ 2023-04-05 22:31 ` Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 3/7] docs: net: use C syntax highlight in driver.rst Jakub Kicinski
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-05 22:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

Somehow it feels more right to start from the probe then open,
then tx... Much like the lifetime of the driver itself.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/driver.rst | 54 ++++++++++++++---------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index 3040a74d421c..bfbd66871bb3 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -4,6 +4,33 @@
 Softnet Driver Issues
 =====================
 
+Probing guidelines
+==================
+
+Address validation
+------------------
+
+Any hardware layer address you obtain for your device should
+be verified.  For example, for ethernet check it with
+linux/etherdevice.h:is_valid_ether_addr()
+
+Close/stop guidelines
+=====================
+
+Quiescence
+----------
+
+After the ndo_stop routine has been called, the hardware must
+not receive or transmit any data.  All in flight packets must
+be aborted. If necessary, poll or wait for completion of
+any reset commands.
+
+Auto-close
+----------
+
+The ndo_stop routine will be called by unregister_netdevice
+if device is still UP.
+
 Transmit path guidelines
 ========================
 
@@ -89,30 +116,3 @@ to be freed up.
 If you return NETDEV_TX_BUSY from the ndo_start_xmit method, you
 must not keep any reference to that SKB and you must not attempt
 to free it up.
-
-Probing guidelines
-==================
-
-Address validation
-------------------
-
-Any hardware layer address you obtain for your device should
-be verified.  For example, for ethernet check it with
-linux/etherdevice.h:is_valid_ether_addr()
-
-Close/stop guidelines
-=====================
-
-Quiescence
-----------
-
-After the ndo_stop routine has been called, the hardware must
-not receive or transmit any data.  All in flight packets must
-be aborted. If necessary, poll or wait for completion of
-any reset commands.
-
-Auto-close
-----------
-
-The ndo_stop routine will be called by unregister_netdevice
-if device is still UP.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 3/7] docs: net: use C syntax highlight in driver.rst
  2023-04-05 22:31 [PATCH net-next v3 0/7] net: lockless stop/wake combo macros Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 1/7] docs: net: reformat driver.rst from a list to sections Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 2/7] docs: net: move the probe and open/close sections of driver.rst up Jakub Kicinski
@ 2023-04-05 22:31 ` Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 4/7] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-05 22:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

Use syntax highlight, comment out the "..." since they are
not valid C.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/driver.rst | 30 +++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index bfbd66871bb3..19c363291d04 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -43,7 +43,9 @@ there is no way your device can tell ahead of time when its
 transmit function will become busy.
 
 Instead it must maintain the queue properly.  For example,
-for a driver implementing scatter-gather this means::
+for a driver implementing scatter-gather this means:
+
+.. code-block:: c
 
 	static netdev_tx_t drv_hard_start_xmit(struct sk_buff *skb,
 					       struct net_device *dev)
@@ -51,7 +53,7 @@ Instead it must maintain the queue properly.  For example,
 		struct drv *dp = netdev_priv(dev);
 
 		lock_tx(dp);
-		...
+		//...
 		/* This is a hard error log it. */
 		if (TX_BUFFS_AVAIL(dp) <= (skb_shinfo(skb)->nr_frags + 1)) {
 			netif_stop_queue(dev);
@@ -61,34 +63,42 @@ Instead it must maintain the queue properly.  For example,
 			return NETDEV_TX_BUSY;
 		}
 
-		... queue packet to card ...
-		... update tx consumer index ...
+		//... queue packet to card ...
+		//... update tx consumer index ...
 
 		if (TX_BUFFS_AVAIL(dp) <= (MAX_SKB_FRAGS + 1))
 			netif_stop_queue(dev);
 
-		...
+		//...
 		unlock_tx(dp);
-		...
+		//...
 		return NETDEV_TX_OK;
 	}
 
-And then at the end of your TX reclamation event handling::
+And then at the end of your TX reclamation event handling:
+
+.. code-block:: c
 
 	if (netif_queue_stopped(dp->dev) &&
 	    TX_BUFFS_AVAIL(dp) > (MAX_SKB_FRAGS + 1))
 		netif_wake_queue(dp->dev);
 
-For a non-scatter-gather supporting card, the three tests simply become::
+For a non-scatter-gather supporting card, the three tests simply become:
+
+.. code-block:: c
 
 		/* This is a hard error log it. */
 		if (TX_BUFFS_AVAIL(dp) <= 0)
 
-and::
+and:
+
+.. code-block:: c
 
 		if (TX_BUFFS_AVAIL(dp) == 0)
 
-and::
+and:
+
+.. code-block:: c
 
 	if (netif_queue_stopped(dp->dev) &&
 	    TX_BUFFS_AVAIL(dp) > 0)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 4/7] net: provide macros for commonly copied lockless queue stop/wake code
  2023-04-05 22:31 [PATCH net-next v3 0/7] net: lockless stop/wake combo macros Jakub Kicinski
                   ` (2 preceding siblings ...)
  2023-04-05 22:31 ` [PATCH net-next v3 3/7] docs: net: use C syntax highlight in driver.rst Jakub Kicinski
@ 2023-04-05 22:31 ` Jakub Kicinski
  2023-04-06  7:22   ` Herbert Xu
  2023-04-05 22:31 ` [PATCH net-next v3 5/7] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-05 22:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski

A lot of drivers follow the same scheme to stop / start queues
without introducing locks between xmit and NAPI tx completions.
I'm guessing they all copy'n'paste each other's code.
The original code dates back all the way to e1000 and Linux 2.6.19.

Smaller drivers shy away from the scheme and introduce a lock
which may cause deadlocks in netpoll.

Provide macros which encapsulate the necessary logic.

The macros do not prevent false wake ups, the extra barrier
required to close that race is not worth it. See discussion in:
https://lore.kernel.org/all/c39312a2-4537-14b4-270c-9fe1fbb91e89@gmail.com/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v3:
 - use smp_mb__after_atomic()
 - improve the comments on barriers
 - document the possibility of false wakes
 - rename netif_tx_queue_* -> netif_txq_*
v2: https://lore.kernel.org/all/20230401051221.3160913-2-kuba@kernel.org/
 - really flip the unlikely into a likely in __netif_tx_queue_maybe_wake()
 - convert if / else into pre-init of _ret
v1: https://lore.kernel.org/all/20230322233028.269410-1-kuba@kernel.org/
 - perdicate -> predicate
 - on race use start instead of wake and make a note of that
   in the doc / comment at the start
rfc: https://lore.kernel.org/all/20230311050130.115138-1-kuba@kernel.org/
---
 Documentation/networking/driver.rst |   6 ++
 include/linux/netdevice.h           |   1 +
 include/net/netdev_queues.h         | 144 ++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+)
 create mode 100644 include/net/netdev_queues.h

diff --git a/Documentation/networking/driver.rst b/Documentation/networking/driver.rst
index 19c363291d04..4071f2c00f8b 100644
--- a/Documentation/networking/driver.rst
+++ b/Documentation/networking/driver.rst
@@ -104,6 +104,12 @@ Instead it must maintain the queue properly.  For example,
 	    TX_BUFFS_AVAIL(dp) > 0)
 		netif_wake_queue(dp->dev);
 
+Lockless queue stop / wake helper macros
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. kernel-doc:: include/net/netdev_queues.h
+   :doc: Lockless queue stopping / waking helpers.
+
 No exclusive ownership
 ----------------------
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a740be3bb911..18770d325499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3341,6 +3341,7 @@ static inline void netif_tx_wake_all_queues(struct net_device *dev)
 
 static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
 {
+	/* Must be an atomic op see netif_txq_try_stop() */
 	set_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state);
 }
 
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
new file mode 100644
index 000000000000..60a9ac5439b3
--- /dev/null
+++ b/include/net/netdev_queues.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_NET_QUEUES_H
+#define _LINUX_NET_QUEUES_H
+
+#include <linux/netdevice.h>
+
+/**
+ * DOC: Lockless queue stopping / waking helpers.
+ *
+ * The netif_txq_maybe_stop() and __netif_txq_completed_wake()
+ * macros are designed to safely implement stopping
+ * and waking netdev queues without full lock protection.
+ *
+ * We assume that there can be no concurrent stop attempts and no concurrent
+ * wake attempts. The try-stop should happen from the xmit handler,
+ * while wake up should be triggered from NAPI poll context.
+ * The two may run concurrently (single producer, single consumer).
+ *
+ * The try-stop side is expected to run from the xmit handler and therefore
+ * it does not reschedule Tx (netif_tx_start_queue() instead of
+ * netif_tx_wake_queue()). Uses of the ``stop`` macros outside of the xmit
+ * handler may lead to xmit queue being enabled but not run.
+ * The waking side does not have similar context restrictions.
+ *
+ * The macros guarantee that rings will not remain stopped if there's
+ * space available, but they do *not* prevent false wake ups when
+ * the ring is full! Drivers should check for ring full at the start
+ * for the xmit handler.
+ *
+ * All descriptor ring indexes (and other relevant shared state) must
+ * be updated before invoking the macros.
+ */
+
+#define netif_txq_try_stop(txq, get_desc, start_thrs)			\
+	({								\
+		int _res;						\
+									\
+		netif_tx_stop_queue(txq);				\
+		/* Producer index and stop bit must be visible		\
+		 * to consumer before we recheck.			\
+		 * Pairs with a barrier in __netif_txq_maybe_wake().	\
+		 */							\
+		smp_mb__after_atomic();					\
+									\
+		/* We need to check again in a case another		\
+		 * CPU has just made room available.			\
+		 */							\
+		_res = 0;						\
+		if (unlikely(get_desc >= start_thrs)) {			\
+			netif_tx_start_queue(txq);			\
+			_res = -1;					\
+		}							\
+		_res;							\
+	})								\
+
+/**
+ * netif_txq_maybe_stop() - locklessly stop a Tx queue, if needed
+ * @txq:	struct netdev_queue to stop/start
+ * @get_desc:	get current number of free descriptors (see requirements below!)
+ * @stop_thrs:	minimal number of available descriptors for queue to be left
+ *		enabled
+ * @start_thrs:	minimal number of descriptors to re-enable the queue, can be
+ *		equal to @stop_thrs or higher to avoid frequent waking
+ *
+ * All arguments may be evaluated multiple times, beware of side effects.
+ * @get_desc must be a formula or a function call, it must always
+ * return up-to-date information when evaluated!
+ * Expected to be used from ndo_start_xmit, see the comment on top of the file.
+ *
+ * Returns:
+ *	 0 if the queue was stopped
+ *	 1 if the queue was left enabled
+ *	-1 if the queue was re-enabled (raced with waking)
+ */
+#define netif_txq_maybe_stop(txq, get_desc, stop_thrs, start_thrs)	\
+	({								\
+		int _res;						\
+									\
+		_res = 1;						\
+		if (unlikely(get_desc < stop_thrs))			\
+			_res = netif_txq_try_stop(txq, get_desc, start_thrs); \
+		_res;							\
+	})								\
+
+
+/**
+ * __netif_txq_maybe_wake() - locklessly wake a Tx queue, if needed
+ * @txq:	struct netdev_queue to stop/start
+ * @get_desc:	get current number of free descriptors (see requirements below!)
+ * @start_thrs:	minimal number of descriptors to re-enable the queue
+ * @down_cond:	down condition, predicate indicating that the queue should
+ *		not be woken up even if descriptors are available
+ *
+ * All arguments may be evaluated multiple times.
+ * @get_desc must be a formula or a function call, it must always
+ * return up-to-date information when evaluated!
+ *
+ * Returns:
+ *	 0 if the queue was woken up
+ *	 1 if the queue was already enabled (or disabled but @down_cond is true)
+ *	-1 if the queue was left stopped
+ */
+#define __netif_txq_maybe_wake(txq, get_desc, start_thrs, down_cond)	\
+	({								\
+		int _res;						\
+									\
+		_res = -1;						\
+		if (likely(get_desc > start_thrs)) {			\
+			/* Make sure that anybody stopping the queue after \
+			 * this sees the new next_to_clean.		\
+			 */						\
+			smp_mb();					\
+			_res = 1;					\
+			if (unlikely(netif_tx_queue_stopped(txq)) &&	\
+			    !(down_cond)) {				\
+				netif_tx_wake_queue(txq);		\
+				_res = 0;				\
+			}						\
+		}							\
+		_res;							\
+	})
+
+#define netif_txq_maybe_wake(txq, get_desc, start_thrs)		\
+	__netif_txq_maybe_wake(txq, get_desc, start_thrs, false)
+
+/* subqueue variants follow */
+
+#define netif_subqueue_try_stop(dev, idx, get_desc, start_thrs)		\
+	({								\
+		struct netdev_queue *txq;				\
+									\
+		txq = netdev_get_tx_queue(dev, idx);			\
+		netif_txq_try_stop(txq, get_desc, start_thrs);		\
+	})
+
+#define netif_subqueue_maybe_stop(dev, idx, get_desc, stop_thrs, start_thrs) \
+	({								\
+		struct netdev_queue *txq;				\
+									\
+		txq = netdev_get_tx_queue(dev, idx);			\
+		netif_txq_maybe_stop(txq, get_desc, stop_thrs, start_thrs); \
+	})
+
+#endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 5/7] ixgbe: use new queue try_stop/try_wake macros
  2023-04-05 22:31 [PATCH net-next v3 0/7] net: lockless stop/wake combo macros Jakub Kicinski
                   ` (3 preceding siblings ...)
  2023-04-05 22:31 ` [PATCH net-next v3 4/7] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
@ 2023-04-05 22:31 ` Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 6/7] bnxt: " Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues Jakub Kicinski
  6 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-05 22:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski, jesse.brandeburg,
	anthony.l.nguyen

Convert ixgbe to use the new macros, I think a lot of people
copy the ixgbe code. The only functional change is that the
unlikely() in ixgbe_clean_tx_irq() turns into a likely()
inside the new macro and no longer includes

  total_packets && netif_carrier_ok(tx_ring->netdev)

which is probably for the best, anyway.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v3:
 - call netdev_get_tx_queue() locally, avoid the need for
   another layer of macros in the core

CC: jesse.brandeburg@intel.com
CC: anthony.l.nguyen@intel.com
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 38 +++++--------------
 1 file changed, 10 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 773c35fecace..cbbddee55db1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -36,6 +36,7 @@
 #include <net/tc_act/tc_mirred.h>
 #include <net/vxlan.h>
 #include <net/mpls.h>
+#include <net/netdev_queues.h>
 #include <net/xdp_sock_drv.h>
 #include <net/xfrm.h>
 
@@ -1119,6 +1120,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	unsigned int total_bytes = 0, total_packets = 0, total_ipsec = 0;
 	unsigned int budget = q_vector->tx.work_limit;
 	unsigned int i = tx_ring->next_to_clean;
+	struct netdev_queue *txq;
 
 	if (test_bit(__IXGBE_DOWN, &adapter->state))
 		return true;
@@ -1253,20 +1255,12 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 				  total_packets, total_bytes);
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
-		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
-		/* Make sure that anybody stopping the queue after this
-		 * sees the new next_to_clean.
-		 */
-		smp_mb();
-		if (__netif_subqueue_stopped(tx_ring->netdev,
-					     tx_ring->queue_index)
-		    && !test_bit(__IXGBE_DOWN, &adapter->state)) {
-			netif_wake_subqueue(tx_ring->netdev,
-					    tx_ring->queue_index);
-			++tx_ring->tx_stats.restart_queue;
-		}
-	}
+	txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index);
+	if (total_packets && netif_carrier_ok(tx_ring->netdev) &&
+	    !__netif_txq_maybe_wake(txq, ixgbe_desc_unused(tx_ring),
+				    TX_WAKE_THRESHOLD,
+				    test_bit(__IXGBE_DOWN, &adapter->state)))
+		++tx_ring->tx_stats.restart_queue;
 
 	return !!budget;
 }
@@ -8270,22 +8264,10 @@ static void ixgbe_tx_olinfo_status(union ixgbe_adv_tx_desc *tx_desc,
 
 static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 {
-	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
-
-	/* Herbert's original patch had:
-	 *  smp_mb__after_netif_stop_queue();
-	 * but since that doesn't exist yet, just open code it.
-	 */
-	smp_mb();
-
-	/* We need to check again in a case another CPU has just
-	 * made room available.
-	 */
-	if (likely(ixgbe_desc_unused(tx_ring) < size))
+	if (!netif_subqueue_try_stop(tx_ring->netdev, tx_ring->queue_index,
+				     ixgbe_desc_unused(tx_ring), size))
 		return -EBUSY;
 
-	/* A reprieve! - use start_queue because it doesn't call schedule */
-	netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
 	++tx_ring->tx_stats.restart_queue;
 	return 0;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 6/7] bnxt: use new queue try_stop/try_wake macros
  2023-04-05 22:31 [PATCH net-next v3 0/7] net: lockless stop/wake combo macros Jakub Kicinski
                   ` (4 preceding siblings ...)
  2023-04-05 22:31 ` [PATCH net-next v3 5/7] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
@ 2023-04-05 22:31 ` Jakub Kicinski
  2023-04-05 22:31 ` [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues Jakub Kicinski
  6 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-05 22:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski, Michael Chan

Convert bnxt to use new macros rather than open code the logic.
Two differences:
(1) bnxt_tx_int() will now only issue a memory barrier if it sees
    enough space on the ring to wake the queue. This should be fine,
    the mb() is between the writes to the ring pointers and checking
    queue state.
(2) we'll start the queue instead of waking on race, this should
    be safe inside the xmit handler.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v3:
 - no change
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 40 ++++-------------------
 1 file changed, 7 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8ff5a4f98d6f..de97bee25249 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -56,6 +56,7 @@
 #include <linux/hwmon-sysfs.h>
 #include <net/page_pool.h>
 #include <linux/align.h>
+#include <net/netdev_queues.h>
 
 #include "bnxt_hsi.h"
 #include "bnxt.h"
@@ -331,26 +332,6 @@ static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
 	txr->kick_pending = 0;
 }
 
-static bool bnxt_txr_netif_try_stop_queue(struct bnxt *bp,
-					  struct bnxt_tx_ring_info *txr,
-					  struct netdev_queue *txq)
-{
-	netif_tx_stop_queue(txq);
-
-	/* netif_tx_stop_queue() must be done before checking
-	 * tx index in bnxt_tx_avail() below, because in
-	 * bnxt_tx_int(), we update tx index before checking for
-	 * netif_tx_queue_stopped().
-	 */
-	smp_mb();
-	if (bnxt_tx_avail(bp, txr) >= bp->tx_wake_thresh) {
-		netif_tx_wake_queue(txq);
-		return false;
-	}
-
-	return true;
-}
-
 static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct bnxt *bp = netdev_priv(dev);
@@ -384,7 +365,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (net_ratelimit() && txr->kick_pending)
 			netif_warn(bp, tx_err, dev,
 				   "bnxt: ring busy w/ flush pending!\n");
-		if (bnxt_txr_netif_try_stop_queue(bp, txr, txq))
+		if (!netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+					bp->tx_wake_thresh))
 			return NETDEV_TX_BUSY;
 	}
 
@@ -614,7 +596,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (netdev_xmit_more() && !tx_buf->is_push)
 			bnxt_txr_db_kick(bp, txr, prod);
 
-		bnxt_txr_netif_try_stop_queue(bp, txr, txq);
+		netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+				   bp->tx_wake_thresh);
 	}
 	return NETDEV_TX_OK;
 
@@ -708,17 +691,8 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 	netdev_tx_completed_queue(txq, nr_pkts, tx_bytes);
 	txr->tx_cons = cons;
 
-	/* Need to make the tx_cons update visible to bnxt_start_xmit()
-	 * before checking for netif_tx_queue_stopped().  Without the
-	 * memory barrier, there is a small possibility that bnxt_start_xmit()
-	 * will miss it and cause the queue to be stopped forever.
-	 */
-	smp_mb();
-
-	if (unlikely(netif_tx_queue_stopped(txq)) &&
-	    bnxt_tx_avail(bp, txr) >= bp->tx_wake_thresh &&
-	    READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING)
-		netif_tx_wake_queue(txq);
+	__netif_txq_maybe_wake(txq, bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
+			       READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
 }
 
 static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues
  2023-04-05 22:31 [PATCH net-next v3 0/7] net: lockless stop/wake combo macros Jakub Kicinski
                   ` (5 preceding siblings ...)
  2023-04-05 22:31 ` [PATCH net-next v3 6/7] bnxt: " Jakub Kicinski
@ 2023-04-05 22:31 ` Jakub Kicinski
  2023-04-06  7:35   ` Herbert Xu
  6 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-05 22:31 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, herbert, alexander.duyck, hkallweit1,
	andrew, willemb, Jakub Kicinski, michael.chan, jesse.brandeburg,
	anthony.l.nguyen

Drivers call netdev_tx_completed_queue() right before
netif_txq_maybe_wake(). If BQL is enabled netdev_tx_completed_queue()
should issue a memory barrier, so we can depend on that separating
the stop check from the consumer index update, instead of adding
another barrier in netif_txq_maybe_wake().

This matters more than the barriers on the xmit path, because
the wake condition is almost always true. So we issue the
consumer side barrier often.

Plus since macro gets pkt/byte counts as arguments now -
we can skip waking if there were no packets completed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v3:
 - new patch

CC: michael.chan@broadcom.com
CC: jesse.brandeburg@intel.com
CC: anthony.l.nguyen@intel.com
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  6 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +++---
 include/linux/netdevice.h                     |  2 +-
 include/net/netdev_queues.h                   | 41 ++++++++++++++-----
 4 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index de97bee25249..f7602d8d79e3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -688,11 +688,11 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
 		dev_kfree_skb_any(skb);
 	}
 
-	netdev_tx_completed_queue(txq, nr_pkts, tx_bytes);
 	txr->tx_cons = cons;
 
-	__netif_txq_maybe_wake(txq, bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
-			       READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
+	__netif_txq_completed_wake(txq, nr_pkts, tx_bytes,
+				   bnxt_tx_avail(bp, txr), bp->tx_wake_thresh,
+				   READ_ONCE(txr->dev_state) != BNXT_DEV_STATE_CLOSING);
 }
 
 static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cbbddee55db1..f2604fc05991 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1251,15 +1251,13 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	if (ring_is_xdp(tx_ring))
 		return !!budget;
 
-	netdev_tx_completed_queue(txring_txq(tx_ring),
-				  total_packets, total_bytes);
-
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
 	txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index);
-	if (total_packets && netif_carrier_ok(tx_ring->netdev) &&
-	    !__netif_txq_maybe_wake(txq, ixgbe_desc_unused(tx_ring),
-				    TX_WAKE_THRESHOLD,
-				    test_bit(__IXGBE_DOWN, &adapter->state)))
+	if (!__netif_txq_completed_wake(txq, total_packets, total_bytes,
+					ixgbe_desc_unused(tx_ring),
+					TX_WAKE_THRESHOLD,
+					netif_carrier_ok(tx_ring->netdev) &&
+					test_bit(__IXGBE_DOWN, &adapter->state)))
 		++tx_ring->tx_stats.restart_queue;
 
 	return !!budget;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 18770d325499..10736fc6d85e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3538,7 +3538,7 @@ static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue,
 	 * netdev_tx_sent_queue will miss the update and cause the queue to
 	 * be stopped forever
 	 */
-	smp_mb();
+	smp_mb(); /* NOTE: netdev_txq_completed_mb() assumes this exists */
 
 	if (unlikely(dql_avail(&dev_queue->dql) < 0))
 		return;
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 60a9ac5439b3..3e38fe8300a7 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -38,7 +38,7 @@
 		netif_tx_stop_queue(txq);				\
 		/* Producer index and stop bit must be visible		\
 		 * to consumer before we recheck.			\
-		 * Pairs with a barrier in __netif_txq_maybe_wake().	\
+		 * Pairs with a barrier in __netif_txq_completed_wake(). \
 		 */							\
 		smp_mb__after_atomic();					\
 									\
@@ -82,10 +82,26 @@
 		_res;							\
 	})								\
 
+/* Variant of netdev_tx_completed_queue() which guarantees smp_mb() if
+ * @bytes != 0, regardless of kernel config.
+ */
+static inline void
+netdev_txq_completed_mb(struct netdev_queue *dev_queue,
+			unsigned int pkts, unsigned int bytes)
+{
+#ifdef CONFIG_BQL
+	netdev_tx_completed_queue(dev_queue, pkts, bytes);
+#else
+	if (bytes)
+		smp_mb();
+#endif
+}
 
 /**
- * __netif_txq_maybe_wake() - locklessly wake a Tx queue, if needed
+ * __netif_txq_completed_wake() - locklessly wake a Tx queue, if needed
  * @txq:	struct netdev_queue to stop/start
+ * @pkts:	number of packets completed
+ * @bytes:	number of bytes completed
  * @get_desc:	get current number of free descriptors (see requirements below!)
  * @start_thrs:	minimal number of descriptors to re-enable the queue
  * @down_cond:	down condition, predicate indicating that the queue should
@@ -94,22 +110,27 @@
  * All arguments may be evaluated multiple times.
  * @get_desc must be a formula or a function call, it must always
  * return up-to-date information when evaluated!
+ * Reports completed pkts/bytes to BQL.
  *
  * Returns:
  *	 0 if the queue was woken up
  *	 1 if the queue was already enabled (or disabled but @down_cond is true)
  *	-1 if the queue was left stopped
  */
-#define __netif_txq_maybe_wake(txq, get_desc, start_thrs, down_cond)	\
+#define __netif_txq_completed_wake(txq, pkts, bytes,			\
+				   get_desc, start_thrs, down_cond)	\
 	({								\
 		int _res;						\
 									\
+		/* Report to BQL and piggy back on its barrier.		\
+		 * Barrier makes sure that anybody stopping the queue	\
+		 * after this point sees the new consumer index.	\
+		 * Pairs with barrier in netif_txq_try_stop().		\
+		 */							\
+		netdev_txq_completed_mb(txq, pkts, bytes);		\
+									\
 		_res = -1;						\
-		if (likely(get_desc > start_thrs)) {			\
-			/* Make sure that anybody stopping the queue after \
-			 * this sees the new next_to_clean.		\
-			 */						\
-			smp_mb();					\
+		if (pkts && likely(get_desc > start_thrs)) {		\
 			_res = 1;					\
 			if (unlikely(netif_tx_queue_stopped(txq)) &&	\
 			    !(down_cond)) {				\
@@ -120,8 +141,8 @@
 		_res;							\
 	})
 
-#define netif_txq_maybe_wake(txq, get_desc, start_thrs)		\
-	__netif_txq_maybe_wake(txq, get_desc, start_thrs, false)
+#define netif_txq_completed_wake(txq, pkts, bytes, get_desc, start_thrs) \
+	__netif_txq_completed_wake(txq, pkts, bytes, get_desc, start_thrs, false)
 
 /* subqueue variants follow */
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 4/7] net: provide macros for commonly copied lockless queue stop/wake code
  2023-04-05 22:31 ` [PATCH net-next v3 4/7] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
@ 2023-04-06  7:22   ` Herbert Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Herbert Xu @ 2023-04-06  7:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, alexander.duyck, hkallweit1,
	andrew, willemb

On Wed, Apr 05, 2023 at 03:31:31PM -0700, Jakub Kicinski wrote:
>
> +/**
> + * __netif_txq_maybe_wake() - locklessly wake a Tx queue, if needed
> + * @txq:	struct netdev_queue to stop/start
> + * @get_desc:	get current number of free descriptors (see requirements below!)
> + * @start_thrs:	minimal number of descriptors to re-enable the queue
> + * @down_cond:	down condition, predicate indicating that the queue should
> + *		not be woken up even if descriptors are available
> + *
> + * All arguments may be evaluated multiple times.
> + * @get_desc must be a formula or a function call, it must always
> + * return up-to-date information when evaluated!
> + *
> + * Returns:
> + *	 0 if the queue was woken up
> + *	 1 if the queue was already enabled (or disabled but @down_cond is true)
> + *	-1 if the queue was left stopped

Perhaps we should say left unchanged instead of left stopped.
Just in case someone tries to be pedantic ten years later and
then claims the code to be buggy :)

Otherwise

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues
  2023-04-05 22:31 ` [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues Jakub Kicinski
@ 2023-04-06  7:35   ` Herbert Xu
  2023-04-07  0:41     ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Herbert Xu @ 2023-04-06  7:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, alexander.duyck, hkallweit1,
	andrew, willemb, michael.chan, jesse.brandeburg,
	anthony.l.nguyen

On Wed, Apr 05, 2023 at 03:31:34PM -0700, Jakub Kicinski wrote:
>					\
> @@ -82,10 +82,26 @@
>  		_res;							\
>  	})								\
>  
> +/* Variant of netdev_tx_completed_queue() which guarantees smp_mb() if
> + * @bytes != 0, regardless of kernel config.
> + */
> +static inline void
> +netdev_txq_completed_mb(struct netdev_queue *dev_queue,
> +			unsigned int pkts, unsigned int bytes)
> +{
> +#ifdef CONFIG_BQL
> +	netdev_tx_completed_queue(dev_queue, pkts, bytes);
> +#else
> +	if (bytes)
> +		smp_mb();
> +#endif
> +}

Minor nit, I would write this as

	if (IS_ENABLED(CONFIG_BQL))
		netdev_tx_completed_queue(dev_queue, pkts, bytes);
	else if (bytes)
		smp_mb();

Actually, why is this checking bytes while the caller is checking
pkts? Do we need to check them at all? If pkts/bytes is commonly
non-zero, then we should just do a barrier unconditionally and make
the uncommon path pay the penalty.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues
  2023-04-06  7:35   ` Herbert Xu
@ 2023-04-07  0:41     ` Jakub Kicinski
  2023-04-12  6:06       ` Herbert Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-07  0:41 UTC (permalink / raw)
  To: Herbert Xu
  Cc: davem, netdev, edumazet, pabeni, alexander.duyck, hkallweit1,
	andrew, willemb, michael.chan, jesse.brandeburg,
	anthony.l.nguyen

On Thu, 6 Apr 2023 15:35:49 +0800 Herbert Xu wrote:
> Minor nit, I would write this as
> 
> 	if (IS_ENABLED(CONFIG_BQL))
> 		netdev_tx_completed_queue(dev_queue, pkts, bytes);
> 	else if (bytes)
> 		smp_mb();

Will do!

> Actually, why is this checking bytes while the caller is checking
> pkts? Do we need to check them at all? If pkts/bytes is commonly
> non-zero, then we should just do a barrier unconditionally and make
> the uncommon path pay the penalty.

I wanted to keep the same semantics as netdev_tx_completed_queue()
which only barriers if (bytes). Not in the least to make it obvious
to someone looking at the code of netdev_txq_completed_mb() (and not
the comment above it) that it doesn't _always_ put a barrier in.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues
  2023-04-07  0:41     ` Jakub Kicinski
@ 2023-04-12  6:06       ` Herbert Xu
  2023-04-12 13:54         ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Herbert Xu @ 2023-04-12  6:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, alexander.duyck, hkallweit1,
	andrew, willemb, michael.chan, jesse.brandeburg,
	anthony.l.nguyen

On Thu, Apr 06, 2023 at 05:41:40PM -0700, Jakub Kicinski wrote:
>
> I wanted to keep the same semantics as netdev_tx_completed_queue()
> which only barriers if (bytes). Not in the least to make it obvious
> to someone looking at the code of netdev_txq_completed_mb() (and not
> the comment above it) that it doesn't _always_ put a barrier in.

OK, but I think we should instead change netdev_tx_compelted_queue
to do the smp_mb unconditionally.  We should never optimise for the
unlikely case, and it is extremely unlikely for a TX cleanup routine
to wind up with nothing to do.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues
  2023-04-12  6:06       ` Herbert Xu
@ 2023-04-12 13:54         ` Jakub Kicinski
  2023-04-13  2:31           ` Herbert Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2023-04-12 13:54 UTC (permalink / raw)
  To: Herbert Xu
  Cc: davem, netdev, edumazet, pabeni, alexander.duyck, hkallweit1,
	andrew, willemb, michael.chan, jesse.brandeburg,
	anthony.l.nguyen

On Wed, 12 Apr 2023 14:06:34 +0800 Herbert Xu wrote:
> On Thu, Apr 06, 2023 at 05:41:40PM -0700, Jakub Kicinski wrote:
> > I wanted to keep the same semantics as netdev_tx_completed_queue()
> > which only barriers if (bytes). Not in the least to make it obvious
> > to someone looking at the code of netdev_txq_completed_mb() (and not
> > the comment above it) that it doesn't _always_ put a barrier in.  
> 
> OK, but I think we should instead change netdev_tx_compelted_queue
> to do the smp_mb unconditionally.  We should never optimise for the
> unlikely case, and it is extremely unlikely for a TX cleanup routine
> to wind up with nothing to do.

I don't understand what you're trying to argue. The whole point of 
the patch is to use the BQL barrier and BQL returns early, before 
the barrier.

I don't think many people actually build kernels with BQL=n so the other
branch is more *documentation* than it is relevant, executed code.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues
  2023-04-12 13:54         ` Jakub Kicinski
@ 2023-04-13  2:31           ` Herbert Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Herbert Xu @ 2023-04-13  2:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, alexander.duyck, hkallweit1,
	andrew, willemb, michael.chan, jesse.brandeburg,
	anthony.l.nguyen

On Wed, Apr 12, 2023 at 06:54:08AM -0700, Jakub Kicinski wrote:
>
> I don't understand what you're trying to argue. The whole point of 
> the patch is to use the BQL barrier and BQL returns early, before 
> the barrier.

I'm saying that the smp_mb should be unconditional in all cases.
As it stands the only time when smp_mb will be skipped is when the
TX cleaner finds no work to do.  That's an extremely unlikely case
and there is no point in skipping the barrier just for that.

> I don't think many people actually build kernels with BQL=n so the other
> branch is more *documentation* than it is relevant, executed code.

No I'm not referring to BQL, I'm referring to bytes/pkts == 0.

Looking into the git history, the bytes == 0 check wasn't even
meant for the barrier as the barrier was added later.  So it
was simply there to make the BQL code function correctly.  All
we have to do is move the check around the BQL code and then
the smp_mb can be done unconditionally.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-04-13  2:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-05 22:31 [PATCH net-next v3 0/7] net: lockless stop/wake combo macros Jakub Kicinski
2023-04-05 22:31 ` [PATCH net-next v3 1/7] docs: net: reformat driver.rst from a list to sections Jakub Kicinski
2023-04-05 22:31 ` [PATCH net-next v3 2/7] docs: net: move the probe and open/close sections of driver.rst up Jakub Kicinski
2023-04-05 22:31 ` [PATCH net-next v3 3/7] docs: net: use C syntax highlight in driver.rst Jakub Kicinski
2023-04-05 22:31 ` [PATCH net-next v3 4/7] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-04-06  7:22   ` Herbert Xu
2023-04-05 22:31 ` [PATCH net-next v3 5/7] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
2023-04-05 22:31 ` [PATCH net-next v3 6/7] bnxt: " Jakub Kicinski
2023-04-05 22:31 ` [PATCH net-next v3 7/7] net: piggy back on the memory barrier in bql when waking queues Jakub Kicinski
2023-04-06  7:35   ` Herbert Xu
2023-04-07  0:41     ` Jakub Kicinski
2023-04-12  6:06       ` Herbert Xu
2023-04-12 13:54         ` Jakub Kicinski
2023-04-13  2:31           ` Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.