netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support
@ 2022-05-19 15:05 Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 01/11] net: mac802154: Rename the synchronous xmit worker Miquel Raynal
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

Hello,

This series brings support for that famous synchronous Tx API for MLME
commands.

MLME commands will be used during scan operations. In this situation,
we need to be sure that all transfers finished and that no transfer
will be queued for a short moment.

Cheers,
Miquèl

Changes in v4:
* Made visible the mlme_tx{_pre,,_post} helpers, used them later in the
  scanning code where relevant.
* Used the atomic_fetch_inc() alternative to only stop the queue when
  necessary.
* Used the netif_running() helper in place of the manual check against
  the IFF_UP netdev flag.
* Changed the error codes to ENETDOWN if the device was closed.
* Reworked the MLME transmissions error path so that they would not keep
  the rtnl taken.
* Updated the logic to avoid erroring out on the mlme_op_pre() call
  which just returns the code of the previous transmission (which we
  likely do not care about here).
* Dropped the queue_stopped variable, used the existing "flags"
  variable, turning it into an unsigned long so that it would accept
  atomic operations. Created a WPAN_PHY_FLAG_STATE_QUEUE_STOPPED
  definition for this purpose.

Changes in v3:
* Tested with lockdep enabled, a more aggressive preemption level and
  the sleeping while atomic warnings enabled.
* Changed the hold/release queue mutex into a spinlock.
* Split the mlme_tx function into three, one to hold the queue, then
  another part that does takes the rtnl and has the real content, and a
  last helper to release the queue.
* Fixed the warning condition in the slow path.
* Used an unsigned long and test/set_bit helpers to follow the queue
  state instead of an atomic_t.

Changes in v2:
* Updated the main tx function error path.
* Added a missing atomic_dec_at_test() call on the hold counter.
* Always called (upon a certain condition) the queue wakeup helper from
  the release queue helper (and similarly in the hold helper) and
  squashed two existing patches in it to simplify the series.
* Introduced a mutex to serialize accesses to the increment/decrement of
  the hold counter and the wake up call.
* Added a warning in case an MLME Tx gets triggered while the device was
  stopped.
* Used the rtnl to ensure the device cannot be stopped while an MLME
  transmission is ongoing.

Changes in v1 since this series got extracted from a bigger change:
* Introduced a new atomic variable to know when the queue is actually
  stopped. So far we only had an atomic to know when the queue was held
  (indicates a transitioning state towards a stopped queue only) and
  another atomic indicating if a transfer was still ongoing at this
  point (used by the wait logic as a condition to wake up).

Miquel Raynal (11):
  net: mac802154: Rename the synchronous xmit worker
  net: mac802154: Rename the main tx_work struct
  net: mac802154: Enhance the error path in the main tx helper
  net: mac802154: Follow the count of ongoing transmissions
  net: mac802154: Bring the ability to hold the transmit queue
  net: mac802154: Create a hot tx path
  net: mac802154: Introduce a helper to disable the queue
  net: mac802154: Introduce a tx queue flushing mechanism
  net: mac802154: Introduce a synchronous API for MLME commands
  net: mac802154: Add a warning in the hot path
  net: mac802154: Add a warning in the slow path

 include/net/cfg802154.h      |  13 +++-
 include/net/mac802154.h      |  27 -------
 net/ieee802154/core.c        |   3 +
 net/mac802154/cfg.c          |   4 +-
 net/mac802154/ieee802154_i.h |  40 +++++++++-
 net/mac802154/main.c         |   2 +-
 net/mac802154/tx.c           | 147 +++++++++++++++++++++++++++++++----
 net/mac802154/util.c         |  71 +++++++++++++++--
 8 files changed, 252 insertions(+), 55 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 01/11] net: mac802154: Rename the synchronous xmit worker
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 02/11] net: mac802154: Rename the main tx_work struct Miquel Raynal
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

There are currently two driver hooks: one is synchronous, the other is
not. We cannot rely on driver implementations to provide a synchronous
API (which is related to the bus medium more than a wish to have a
synchronized implementation) so we are going to introduce a sync API
above any kind of driver transmit function. In order to clarify what
this worker is for (synchronous driver implementation), let's rename it
so that people don't get bothered by the fact that their driver does not
make use of the "xmit worker" which is a too generic name.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/ieee802154_i.h | 2 +-
 net/mac802154/main.c         | 2 +-
 net/mac802154/tx.c           | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index 1381e6a5e180..d7632c6d225f 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -123,7 +123,7 @@ ieee802154_sdata_running(struct ieee802154_sub_if_data *sdata)
 extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
 
 void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
-void ieee802154_xmit_worker(struct work_struct *work);
+void ieee802154_xmit_sync_worker(struct work_struct *work);
 netdev_tx_t
 ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
 netdev_tx_t
diff --git a/net/mac802154/main.c b/net/mac802154/main.c
index bd7bdb1219dd..392771bba9dd 100644
--- a/net/mac802154/main.c
+++ b/net/mac802154/main.c
@@ -95,7 +95,7 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops)
 
 	skb_queue_head_init(&local->skb_queue);
 
-	INIT_WORK(&local->tx_work, ieee802154_xmit_worker);
+	INIT_WORK(&local->tx_work, ieee802154_xmit_sync_worker);
 
 	/* init supported flags with 802.15.4 default ranges */
 	phy->supported.max_minbe = 8;
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index c829e4a75325..97df5985b830 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -22,7 +22,7 @@
 #include "ieee802154_i.h"
 #include "driver-ops.h"
 
-void ieee802154_xmit_worker(struct work_struct *work)
+void ieee802154_xmit_sync_worker(struct work_struct *work)
 {
 	struct ieee802154_local *local =
 		container_of(work, struct ieee802154_local, tx_work);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 02/11] net: mac802154: Rename the main tx_work struct
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 01/11] net: mac802154: Rename the synchronous xmit worker Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 03/11] net: mac802154: Enhance the error path in the main tx helper Miquel Raynal
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

This entry is dedicated to synchronous transmissions done by drivers
without async hook. Make this clearer that this is not a work that any
driver can use by at least prefixing it with "sync_". While at it, let's
enhance the comment explaining why we choose one or the other.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/ieee802154_i.h | 2 +-
 net/mac802154/main.c         | 2 +-
 net/mac802154/tx.c           | 9 ++++++---
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index d7632c6d225f..a8b7b9049f14 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -55,7 +55,7 @@ struct ieee802154_local {
 	struct sk_buff_head skb_queue;
 
 	struct sk_buff *tx_skb;
-	struct work_struct tx_work;
+	struct work_struct sync_tx_work;
 	/* A negative Linux error code or a null/positive MLME error status */
 	int tx_result;
 };
diff --git a/net/mac802154/main.c b/net/mac802154/main.c
index 392771bba9dd..40fab08df24b 100644
--- a/net/mac802154/main.c
+++ b/net/mac802154/main.c
@@ -95,7 +95,7 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops)
 
 	skb_queue_head_init(&local->skb_queue);
 
-	INIT_WORK(&local->tx_work, ieee802154_xmit_sync_worker);
+	INIT_WORK(&local->sync_tx_work, ieee802154_xmit_sync_worker);
 
 	/* init supported flags with 802.15.4 default ranges */
 	phy->supported.max_minbe = 8;
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 97df5985b830..a01689ddd547 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -25,7 +25,7 @@
 void ieee802154_xmit_sync_worker(struct work_struct *work)
 {
 	struct ieee802154_local *local =
-		container_of(work, struct ieee802154_local, tx_work);
+		container_of(work, struct ieee802154_local, sync_tx_work);
 	struct sk_buff *skb = local->tx_skb;
 	struct net_device *dev = skb->dev;
 	int res;
@@ -76,7 +76,10 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 	/* Stop the netif queue on each sub_if_data object. */
 	ieee802154_stop_queue(&local->hw);
 
-	/* async is priority, otherwise sync is fallback */
+	/* Drivers should preferably implement the async callback. In some rare
+	 * cases they only provide a sync callback which we will use as a
+	 * fallback.
+	 */
 	if (local->ops->xmit_async) {
 		unsigned int len = skb->len;
 
@@ -90,7 +93,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 		dev->stats.tx_bytes += len;
 	} else {
 		local->tx_skb = skb;
-		queue_work(local->workqueue, &local->tx_work);
+		queue_work(local->workqueue, &local->sync_tx_work);
 	}
 
 	return NETDEV_TX_OK;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 03/11] net: mac802154: Enhance the error path in the main tx helper
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 01/11] net: mac802154: Rename the synchronous xmit worker Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 02/11] net: mac802154: Rename the main tx_work struct Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 04/11] net: mac802154: Follow the count of ongoing transmissions Miquel Raynal
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

Before adding more logic in the error path, let's move the wake queue
call there, rename the default label and create an additional one.

There is no functional change.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/tx.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index a01689ddd547..4a46ce8d2ac8 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -65,7 +65,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 				consume_skb(skb);
 				skb = nskb;
 			} else {
-				goto err_tx;
+				goto err_free_skb;
 			}
 		}
 
@@ -84,10 +84,8 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 		unsigned int len = skb->len;
 
 		ret = drv_xmit_async(local, skb);
-		if (ret) {
-			ieee802154_wake_queue(&local->hw);
-			goto err_tx;
-		}
+		if (ret)
+			goto err_wake_netif_queue;
 
 		dev->stats.tx_packets++;
 		dev->stats.tx_bytes += len;
@@ -98,7 +96,9 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 	return NETDEV_TX_OK;
 
-err_tx:
+err_wake_netif_queue:
+	ieee802154_wake_queue(&local->hw);
+err_free_skb:
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 04/11] net: mac802154: Follow the count of ongoing transmissions
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
                   ` (2 preceding siblings ...)
  2022-05-19 15:05 ` [PATCH wpan-next v4 03/11] net: mac802154: Enhance the error path in the main tx helper Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 05/11] net: mac802154: Bring the ability to hold the transmit queue Miquel Raynal
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

In order to create a synchronous API for MLME command purposes, we need
to be able to track the end of the ongoing transmissions. Let's
introduce an atomic variable which is incremented when a transmission
starts and decremented when relevant so that we know at any moment
whether there is an ongoing transmission.

The counter gets decremented in the following situations:
- The operation is asynchronous and there was a failure during the
  offloading process.
- The operation is synchronous and the synchronous operation failed.
- The operation finished, either successfully or not.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 include/net/cfg802154.h | 3 +++
 net/mac802154/tx.c      | 3 +++
 net/mac802154/util.c    | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 85f9e8417688..473ebcb9b155 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -214,6 +214,9 @@ struct wpan_phy {
 	/* the network namespace this phy lives in currently */
 	possible_net_t _net;
 
+	/* Transmission monitoring */
+	atomic_t ongoing_txs;
+
 	char priv[] __aligned(NETDEV_ALIGN);
 };
 
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 4a46ce8d2ac8..33f64ecd96c7 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -44,6 +44,7 @@ void ieee802154_xmit_sync_worker(struct work_struct *work)
 err_tx:
 	/* Restart the netif queue on each sub_if_data object. */
 	ieee802154_wake_queue(&local->hw);
+	atomic_dec(&local->phy->ongoing_txs);
 	kfree_skb(skb);
 	netdev_dbg(dev, "transmission failed\n");
 }
@@ -75,6 +76,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 	/* Stop the netif queue on each sub_if_data object. */
 	ieee802154_stop_queue(&local->hw);
+	atomic_inc(&local->phy->ongoing_txs);
 
 	/* Drivers should preferably implement the async callback. In some rare
 	 * cases they only provide a sync callback which we will use as a
@@ -98,6 +100,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 err_wake_netif_queue:
 	ieee802154_wake_queue(&local->hw);
+	atomic_dec(&local->phy->ongoing_txs);
 err_free_skb:
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 9f024d85563b..76dc663e2af4 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -88,6 +88,7 @@ void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
 	}
 
 	dev_consume_skb_any(skb);
+	atomic_dec(&hw->phy->ongoing_txs);
 }
 EXPORT_SYMBOL(ieee802154_xmit_complete);
 
@@ -99,6 +100,7 @@ void ieee802154_xmit_error(struct ieee802154_hw *hw, struct sk_buff *skb,
 	local->tx_result = reason;
 	ieee802154_wake_queue(hw);
 	dev_kfree_skb_any(skb);
+	atomic_dec(&hw->phy->ongoing_txs);
 }
 EXPORT_SYMBOL(ieee802154_xmit_error);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 05/11] net: mac802154: Bring the ability to hold the transmit queue
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
                   ` (3 preceding siblings ...)
  2022-05-19 15:05 ` [PATCH wpan-next v4 04/11] net: mac802154: Follow the count of ongoing transmissions Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 06/11] net: mac802154: Create a hot tx path Miquel Raynal
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

Create a hold_txs atomic variable and increment/decrement it when
relevant, ie. when we want to hold the queue or release it: currently
all the "stopped" situations are suitable, but very soon we will more
extensively use this feature for MLME purposes.

Upon release, the atomic counter is decremented and checked. If it is
back to 0, then the netif queue gets woken up. This makes the whole
process fully transparent, provided that all the users of
ieee802154_wake/stop_queue() now call ieee802154_hold/release_queue()
instead.

In no situation individual drivers should call any of these helpers
manually in order to avoid messing with the counters. There are other
functions more suited for this purpose which have been introduced, such
as the _xmit_complete() and _xmit_error() helpers which will handle all
that for them.

One advantage is that, as no more drivers call the stop/wake helpers
directly, we can safely stop exporting them and only declare the
hold/release ones in a header only accessible to the core.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 include/net/cfg802154.h      |  6 +++--
 include/net/mac802154.h      | 27 -------------------
 net/ieee802154/core.c        |  2 ++
 net/mac802154/cfg.c          |  4 +--
 net/mac802154/ieee802154_i.h | 19 +++++++++++++
 net/mac802154/tx.c           |  6 ++---
 net/mac802154/util.c         | 52 +++++++++++++++++++++++++++++++-----
 7 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 473ebcb9b155..7a191418f258 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -11,7 +11,7 @@
 
 #include <linux/ieee802154.h>
 #include <linux/netdevice.h>
-#include <linux/mutex.h>
+#include <linux/spinlock.h>
 #include <linux/bug.h>
 
 #include <net/nl802154.h>
@@ -214,8 +214,10 @@ struct wpan_phy {
 	/* the network namespace this phy lives in currently */
 	possible_net_t _net;
 
-	/* Transmission monitoring */
+	/* Transmission monitoring and control */
+	spinlock_t queue_lock;
 	atomic_t ongoing_txs;
+	atomic_t hold_txs;
 
 	char priv[] __aligned(NETDEV_ALIGN);
 };
diff --git a/include/net/mac802154.h b/include/net/mac802154.h
index bdac0ddbdcdb..357d25ef627a 100644
--- a/include/net/mac802154.h
+++ b/include/net/mac802154.h
@@ -460,33 +460,6 @@ void ieee802154_unregister_hw(struct ieee802154_hw *hw);
  */
 void ieee802154_rx_irqsafe(struct ieee802154_hw *hw, struct sk_buff *skb,
 			   u8 lqi);
-/**
- * ieee802154_wake_queue - wake ieee802154 queue
- * @hw: pointer as obtained from ieee802154_alloc_hw().
- *
- * Tranceivers usually have either one transmit framebuffer or one framebuffer
- * for both transmitting and receiving. Hence, the core currently only handles
- * one frame at a time for each phy, which means we had to stop the queue to
- * avoid new skb to come during the transmission. The queue then needs to be
- * woken up after the operation.
- *
- * Drivers should use this function instead of netif_wake_queue.
- */
-void ieee802154_wake_queue(struct ieee802154_hw *hw);
-
-/**
- * ieee802154_stop_queue - stop ieee802154 queue
- * @hw: pointer as obtained from ieee802154_alloc_hw().
- *
- * Tranceivers usually have either one transmit framebuffer or one framebuffer
- * for both transmitting and receiving. Hence, the core currently only handles
- * one frame at a time for each phy, which means we need to tell upper layers to
- * stop giving us new skbs while we are busy with the transmitted one. The queue
- * must then be stopped before transmitting.
- *
- * Drivers should use this function instead of netif_stop_queue.
- */
-void ieee802154_stop_queue(struct ieee802154_hw *hw);
 
 /**
  * ieee802154_xmit_complete - frame transmission complete
diff --git a/net/ieee802154/core.c b/net/ieee802154/core.c
index de259b5170ab..47a4de6df88b 100644
--- a/net/ieee802154/core.c
+++ b/net/ieee802154/core.c
@@ -130,6 +130,8 @@ wpan_phy_new(const struct cfg802154_ops *ops, size_t priv_size)
 
 	init_waitqueue_head(&rdev->dev_wait);
 
+	spin_lock_init(&rdev->wpan_phy.queue_lock);
+
 	return &rdev->wpan_phy;
 }
 EXPORT_SYMBOL(wpan_phy_new);
diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c
index 1e4a9f74ed43..b51100fd9e3f 100644
--- a/net/mac802154/cfg.c
+++ b/net/mac802154/cfg.c
@@ -46,7 +46,7 @@ static int ieee802154_suspend(struct wpan_phy *wpan_phy)
 	if (!local->open_count)
 		goto suspend;
 
-	ieee802154_stop_queue(&local->hw);
+	ieee802154_hold_queue(local);
 	synchronize_net();
 
 	/* stop hardware - this must stop RX */
@@ -72,7 +72,7 @@ static int ieee802154_resume(struct wpan_phy *wpan_phy)
 		return ret;
 
 wake_up:
-	ieee802154_wake_queue(&local->hw);
+	ieee802154_release_queue(local);
 	local->suspended = false;
 	return 0;
 }
diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index a8b7b9049f14..0c7ff9e0b632 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -130,6 +130,25 @@ netdev_tx_t
 ieee802154_subif_start_xmit(struct sk_buff *skb, struct net_device *dev);
 enum hrtimer_restart ieee802154_xmit_ifs_timer(struct hrtimer *timer);
 
+/**
+ * ieee802154_hold_queue - hold ieee802154 queue
+ * @local: main mac object
+ *
+ * Hold a queue by incrementing an atomic counter and requesting the netif
+ * queues to be stopped. The queues cannot be woken up while the counter has not
+ * been reset with as any ieee802154_release_queue() calls as needed.
+ */
+void ieee802154_hold_queue(struct ieee802154_local *local);
+
+/**
+ * ieee802154_release_queue - release ieee802154 queue
+ * @local: main mac object
+ *
+ * Release a queue which is held by decrementing an atomic counter and wake it
+ * up only if the counter reaches 0.
+ */
+void ieee802154_release_queue(struct ieee802154_local *local);
+
 /* MIB callbacks */
 void mac802154_dev_set_page_channel(struct net_device *dev, u8 page, u8 chan);
 
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 33f64ecd96c7..6a53c83cf039 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -43,7 +43,7 @@ void ieee802154_xmit_sync_worker(struct work_struct *work)
 
 err_tx:
 	/* Restart the netif queue on each sub_if_data object. */
-	ieee802154_wake_queue(&local->hw);
+	ieee802154_release_queue(local);
 	atomic_dec(&local->phy->ongoing_txs);
 	kfree_skb(skb);
 	netdev_dbg(dev, "transmission failed\n");
@@ -75,7 +75,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 	}
 
 	/* Stop the netif queue on each sub_if_data object. */
-	ieee802154_stop_queue(&local->hw);
+	ieee802154_hold_queue(local);
 	atomic_inc(&local->phy->ongoing_txs);
 
 	/* Drivers should preferably implement the async callback. In some rare
@@ -99,7 +99,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 	return NETDEV_TX_OK;
 
 err_wake_netif_queue:
-	ieee802154_wake_queue(&local->hw);
+	ieee802154_release_queue(local);
 	atomic_dec(&local->phy->ongoing_txs);
 err_free_skb:
 	kfree_skb(skb);
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 76dc663e2af4..0ed8b5bcbe8a 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -13,7 +13,17 @@
 /* privid for wpan_phys to determine whether they belong to us or not */
 const void *const mac802154_wpan_phy_privid = &mac802154_wpan_phy_privid;
 
-void ieee802154_wake_queue(struct ieee802154_hw *hw)
+/**
+ * ieee802154_wake_queue - wake ieee802154 queue
+ * @local: main mac object
+ *
+ * Tranceivers usually have either one transmit framebuffer or one framebuffer
+ * for both transmitting and receiving. Hence, the core currently only handles
+ * one frame at a time for each phy, which means we had to stop the queue to
+ * avoid new skb to come during the transmission. The queue then needs to be
+ * woken up after the operation.
+ */
+static void ieee802154_wake_queue(struct ieee802154_hw *hw)
 {
 	struct ieee802154_local *local = hw_to_local(hw);
 	struct ieee802154_sub_if_data *sdata;
@@ -27,9 +37,18 @@ void ieee802154_wake_queue(struct ieee802154_hw *hw)
 	}
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(ieee802154_wake_queue);
 
-void ieee802154_stop_queue(struct ieee802154_hw *hw)
+/**
+ * ieee802154_stop_queue - stop ieee802154 queue
+ * @local: main mac object
+ *
+ * Tranceivers usually have either one transmit framebuffer or one framebuffer
+ * for both transmitting and receiving. Hence, the core currently only handles
+ * one frame at a time for each phy, which means we need to tell upper layers to
+ * stop giving us new skbs while we are busy with the transmitted one. The queue
+ * must then be stopped before transmitting.
+ */
+static void ieee802154_stop_queue(struct ieee802154_hw *hw)
 {
 	struct ieee802154_local *local = hw_to_local(hw);
 	struct ieee802154_sub_if_data *sdata;
@@ -43,14 +62,33 @@ void ieee802154_stop_queue(struct ieee802154_hw *hw)
 	}
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(ieee802154_stop_queue);
+
+void ieee802154_hold_queue(struct ieee802154_local *local)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&local->phy->queue_lock, flags);
+	if (!atomic_fetch_inc(&local->phy->hold_txs))
+		ieee802154_stop_queue(&local->hw);
+	spin_unlock_irqrestore(&local->phy->queue_lock, flags);
+}
+
+void ieee802154_release_queue(struct ieee802154_local *local)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&local->phy->queue_lock, flags);
+	if (!atomic_dec_and_test(&local->phy->hold_txs))
+		ieee802154_wake_queue(&local->hw);
+	spin_unlock_irqrestore(&local->phy->queue_lock, flags);
+}
 
 enum hrtimer_restart ieee802154_xmit_ifs_timer(struct hrtimer *timer)
 {
 	struct ieee802154_local *local =
 		container_of(timer, struct ieee802154_local, ifs_timer);
 
-	ieee802154_wake_queue(&local->hw);
+	ieee802154_release_queue(local);
 
 	return HRTIMER_NORESTART;
 }
@@ -84,7 +122,7 @@ void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
 				      hw->phy->sifs_period * NSEC_PER_USEC,
 				      HRTIMER_MODE_REL);
 	} else {
-		ieee802154_wake_queue(hw);
+		ieee802154_release_queue(local);
 	}
 
 	dev_consume_skb_any(skb);
@@ -98,7 +136,7 @@ void ieee802154_xmit_error(struct ieee802154_hw *hw, struct sk_buff *skb,
 	struct ieee802154_local *local = hw_to_local(hw);
 
 	local->tx_result = reason;
-	ieee802154_wake_queue(hw);
+	ieee802154_release_queue(local);
 	dev_kfree_skb_any(skb);
 	atomic_dec(&hw->phy->ongoing_txs);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 06/11] net: mac802154: Create a hot tx path
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
                   ` (4 preceding siblings ...)
  2022-05-19 15:05 ` [PATCH wpan-next v4 05/11] net: mac802154: Bring the ability to hold the transmit queue Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 07/11] net: mac802154: Introduce a helper to disable the queue Miquel Raynal
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

Let's rename the current Tx path to show that this is the "hot" Tx
path. We will soon introduce a slower Tx path for MLME commands.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/tx.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 6a53c83cf039..607019b8f8ab 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -106,6 +106,12 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 	return NETDEV_TX_OK;
 }
 
+static netdev_tx_t
+ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
+{
+	return ieee802154_tx(local, skb);
+}
+
 netdev_tx_t
 ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
@@ -113,7 +119,7 @@ ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb->skb_iif = dev->ifindex;
 
-	return ieee802154_tx(sdata->local, skb);
+	return ieee802154_hot_tx(sdata->local, skb);
 }
 
 netdev_tx_t
@@ -135,5 +141,5 @@ ieee802154_subif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb->skb_iif = dev->ifindex;
 
-	return ieee802154_tx(sdata->local, skb);
+	return ieee802154_hot_tx(sdata->local, skb);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 07/11] net: mac802154: Introduce a helper to disable the queue
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
                   ` (5 preceding siblings ...)
  2022-05-19 15:05 ` [PATCH wpan-next v4 06/11] net: mac802154: Create a hot tx path Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 08/11] net: mac802154: Introduce a tx queue flushing mechanism Miquel Raynal
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

Sometimes calling the stop queue helper is not enough because it does
not hold any lock. In order to be safe and avoid racy situations when
trying to (soon) sync the Tx queue, for instance before sending an MLME
frame, let's now introduce an helper which actually hold the necessary
locks when doing so.

Suggested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/ieee802154_i.h | 12 ++++++++++++
 net/mac802154/util.c         | 14 ++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index 0c7ff9e0b632..e34db1d49ef4 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -149,6 +149,18 @@ void ieee802154_hold_queue(struct ieee802154_local *local);
  */
 void ieee802154_release_queue(struct ieee802154_local *local);
 
+/**
+ * ieee802154_disable_queue - disable ieee802154 queue
+ * @local: main mac object
+ *
+ * When trying to sync the Tx queue, we cannot just stop the queue
+ * (which is basically a bit being set without proper lock handling)
+ * because it would be racy. We actually need to call netif_tx_disable()
+ * instead, which is done by this helper. Restarting the queue can
+ * however still be done with a regular wake call.
+ */
+void ieee802154_disable_queue(struct ieee802154_local *local);
+
 /* MIB callbacks */
 void mac802154_dev_set_page_channel(struct net_device *dev, u8 page, u8 chan);
 
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 0ed8b5bcbe8a..999534f64485 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -83,6 +83,20 @@ void ieee802154_release_queue(struct ieee802154_local *local)
 	spin_unlock_irqrestore(&local->phy->queue_lock, flags);
 }
 
+void ieee802154_disable_queue(struct ieee802154_local *local)
+{
+	struct ieee802154_sub_if_data *sdata;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
+		if (!sdata->dev)
+			continue;
+
+		netif_tx_disable(sdata->dev);
+	}
+	rcu_read_unlock();
+}
+
 enum hrtimer_restart ieee802154_xmit_ifs_timer(struct hrtimer *timer)
 {
 	struct ieee802154_local *local =
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 08/11] net: mac802154: Introduce a tx queue flushing mechanism
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
                   ` (6 preceding siblings ...)
  2022-05-19 15:05 ` [PATCH wpan-next v4 07/11] net: mac802154: Introduce a helper to disable the queue Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 09/11] net: mac802154: Introduce a synchronous API for MLME commands Miquel Raynal
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

Right now we are able to stop a queue but we have no indication if a
transmission is ongoing or not.

Thanks to recent additions, we can track the number of ongoing
transmissions so we know if the last transmission is over. Adding on top
of it an internal wait queue also allows to be woken up asynchronously
when this happens. If, beforehands, we marked the queue to be held and
stopped it, we end up flushing and stopping the tx queue.

Thanks to this feature, we will soon be able to introduce a synchronous
transmit API.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 include/net/cfg802154.h      |  1 +
 net/ieee802154/core.c        |  1 +
 net/mac802154/cfg.c          |  2 +-
 net/mac802154/ieee802154_i.h |  1 +
 net/mac802154/tx.c           | 26 ++++++++++++++++++++++++--
 net/mac802154/util.c         |  6 ++++--
 6 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 7a191418f258..8881b6126b58 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -218,6 +218,7 @@ struct wpan_phy {
 	spinlock_t queue_lock;
 	atomic_t ongoing_txs;
 	atomic_t hold_txs;
+	wait_queue_head_t sync_txq;
 
 	char priv[] __aligned(NETDEV_ALIGN);
 };
diff --git a/net/ieee802154/core.c b/net/ieee802154/core.c
index 47a4de6df88b..57546e07e06a 100644
--- a/net/ieee802154/core.c
+++ b/net/ieee802154/core.c
@@ -129,6 +129,7 @@ wpan_phy_new(const struct cfg802154_ops *ops, size_t priv_size)
 	wpan_phy_net_set(&rdev->wpan_phy, &init_net);
 
 	init_waitqueue_head(&rdev->dev_wait);
+	init_waitqueue_head(&rdev->wpan_phy.sync_txq);
 
 	spin_lock_init(&rdev->wpan_phy.queue_lock);
 
diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c
index b51100fd9e3f..93df24f75572 100644
--- a/net/mac802154/cfg.c
+++ b/net/mac802154/cfg.c
@@ -46,7 +46,7 @@ static int ieee802154_suspend(struct wpan_phy *wpan_phy)
 	if (!local->open_count)
 		goto suspend;
 
-	ieee802154_hold_queue(local);
+	ieee802154_sync_and_hold_queue(local);
 	synchronize_net();
 
 	/* stop hardware - this must stop RX */
diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index e34db1d49ef4..a057827fc48a 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -124,6 +124,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
 
 void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
 void ieee802154_xmit_sync_worker(struct work_struct *work);
+int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
 netdev_tx_t
 ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
 netdev_tx_t
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 607019b8f8ab..38f74b8b6740 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -44,7 +44,8 @@ void ieee802154_xmit_sync_worker(struct work_struct *work)
 err_tx:
 	/* Restart the netif queue on each sub_if_data object. */
 	ieee802154_release_queue(local);
-	atomic_dec(&local->phy->ongoing_txs);
+	if (!atomic_dec_and_test(&local->phy->ongoing_txs))
+		wake_up(&local->phy->sync_txq);
 	kfree_skb(skb);
 	netdev_dbg(dev, "transmission failed\n");
 }
@@ -100,12 +101,33 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 err_wake_netif_queue:
 	ieee802154_release_queue(local);
-	atomic_dec(&local->phy->ongoing_txs);
+	if (!atomic_dec_and_test(&local->phy->ongoing_txs))
+		wake_up(&local->phy->sync_txq);
 err_free_skb:
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
 }
 
+static int ieee802154_sync_queue(struct ieee802154_local *local)
+{
+	int ret;
+
+	ieee802154_hold_queue(local);
+	ieee802154_disable_queue(local);
+	wait_event(local->phy->sync_txq, !atomic_read(&local->phy->ongoing_txs));
+	ret = local->tx_result;
+	ieee802154_release_queue(local);
+
+	return ret;
+}
+
+int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
+{
+	ieee802154_hold_queue(local);
+
+	return ieee802154_sync_queue(local);
+}
+
 static netdev_tx_t
 ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
 {
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 999534f64485..5e1fcc7b0123 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -140,7 +140,8 @@ void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
 	}
 
 	dev_consume_skb_any(skb);
-	atomic_dec(&hw->phy->ongoing_txs);
+	if (!atomic_dec_and_test(&hw->phy->ongoing_txs))
+		wake_up(&hw->phy->sync_txq);
 }
 EXPORT_SYMBOL(ieee802154_xmit_complete);
 
@@ -152,7 +153,8 @@ void ieee802154_xmit_error(struct ieee802154_hw *hw, struct sk_buff *skb,
 	local->tx_result = reason;
 	ieee802154_release_queue(local);
 	dev_kfree_skb_any(skb);
-	atomic_dec(&hw->phy->ongoing_txs);
+	if (!atomic_dec_and_test(&hw->phy->ongoing_txs))
+		wake_up(&hw->phy->sync_txq);
 }
 EXPORT_SYMBOL(ieee802154_xmit_error);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 09/11] net: mac802154: Introduce a synchronous API for MLME commands
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
                   ` (7 preceding siblings ...)
  2022-05-19 15:05 ` [PATCH wpan-next v4 08/11] net: mac802154: Introduce a tx queue flushing mechanism Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 10/11] net: mac802154: Add a warning in the hot path Miquel Raynal
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

This is the slow path, we need to wait for each command to be processed
before continuing so let's introduce an helper which does the
transmission and blocks until it gets notified of its asynchronous
completion. This helper is going to be used when introducing scan
support.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/ieee802154_i.h |  4 ++++
 net/mac802154/tx.c           | 44 ++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index a057827fc48a..8a4816ae71e7 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -125,6 +125,10 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
 void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
 void ieee802154_xmit_sync_worker(struct work_struct *work);
 int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
+int ieee802154_mlme_op_pre(struct ieee802154_local *local);
+int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb);
+void ieee802154_mlme_op_post(struct ieee802154_local *local);
+int ieee802154_mlme_tx_one(struct ieee802154_local *local, struct sk_buff *skb);
 netdev_tx_t
 ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
 netdev_tx_t
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 38f74b8b6740..4827391600f6 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -128,6 +128,50 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
 	return ieee802154_sync_queue(local);
 }
 
+int ieee802154_mlme_op_pre(struct ieee802154_local *local)
+{
+	return ieee802154_sync_and_hold_queue(local);
+}
+
+int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
+{
+	int ret;
+
+	/* Avoid possible calls to ->ndo_stop() when we asynchronously perform
+	 * MLME transmissions.
+	 */
+	rtnl_lock();
+
+	/* Ensure the device was not stopped, otherwise error out */
+	if (!local->open_count) {
+		rtnl_unlock();
+		return -ENETDOWN;
+	}
+
+	ieee802154_tx(local, skb);
+	ret = ieee802154_sync_queue(local);
+
+	rtnl_unlock();
+
+	return ret;
+}
+
+void ieee802154_mlme_op_post(struct ieee802154_local *local)
+{
+	ieee802154_release_queue(local);
+}
+
+int ieee802154_mlme_tx_one(struct ieee802154_local *local, struct sk_buff *skb)
+{
+	int ret;
+
+	ieee802154_mlme_op_pre(local);
+	ret = ieee802154_mlme_tx(local, skb);
+	ieee802154_mlme_op_post(local);
+
+	return ret;
+}
+
 static netdev_tx_t
 ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 10/11] net: mac802154: Add a warning in the hot path
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
                   ` (8 preceding siblings ...)
  2022-05-19 15:05 ` [PATCH wpan-next v4 09/11] net: mac802154: Introduce a synchronous API for MLME commands Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-05-19 15:05 ` [PATCH wpan-next v4 11/11] net: mac802154: Add a warning in the slow path Miquel Raynal
  2022-06-01  3:30 ` [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Alexander Aring
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

We should never start a transmission after the queue has been stopped.

But because it might work we don't kill the function here but rather
warn loudly the user that something is wrong.

Set a flag when the queue should remain stopped. Reset this flag when
the queue actually gets restarded. Just check this value to know if a
transmission is legitimate, warn if it is not.

Turn the flags variable into an unsigned long to allow the use of atomic
helpers on it.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 include/net/cfg802154.h |  5 ++++-
 net/mac802154/tx.c      | 16 +++++++++++++++-
 net/mac802154/util.c    |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 8881b6126b58..04b996895fc1 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -166,11 +166,14 @@ wpan_phy_cca_cmp(const struct wpan_phy_cca *a, const struct wpan_phy_cca *b)
  *	level setting.
  * @WPAN_PHY_FLAG_CCA_MODE: Indicates that transceiver will support cca mode
  *	setting.
+ * @WPAN_PHY_FLAG_STATE_QUEUE_STOPPED: Indicates that the transmit queue was
+ *	temporarily stopped.
  */
 enum wpan_phy_flags {
 	WPAN_PHY_FLAG_TXPOWER		= BIT(1),
 	WPAN_PHY_FLAG_CCA_ED_LEVEL	= BIT(2),
 	WPAN_PHY_FLAG_CCA_MODE		= BIT(3),
+	WPAN_PHY_FLAG_STATE_QUEUE_STOPPED = BIT(4),
 };
 
 struct wpan_phy {
@@ -182,7 +185,7 @@ struct wpan_phy {
 	 */
 	const void *privid;
 
-	u32 flags;
+	unsigned long flags;
 
 	/*
 	 * This is a PIB according to 802.15.4-2011.
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 4827391600f6..6188f42276e7 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -123,9 +123,13 @@ static int ieee802154_sync_queue(struct ieee802154_local *local)
 
 int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
 {
+	int ret;
+
 	ieee802154_hold_queue(local);
+	ret = ieee802154_sync_queue(local);
+	set_bit(WPAN_PHY_FLAG_STATE_QUEUE_STOPPED, &local->phy->flags);
 
-	return ieee802154_sync_queue(local);
+	return ret;
 }
 
 int ieee802154_mlme_op_pre(struct ieee802154_local *local)
@@ -172,9 +176,19 @@ int ieee802154_mlme_tx_one(struct ieee802154_local *local, struct sk_buff *skb)
 	return ret;
 }
 
+static bool ieee802154_queue_is_stopped(struct ieee802154_local *local)
+{
+	return test_bit(WPAN_PHY_FLAG_STATE_QUEUE_STOPPED, &local->phy->flags);
+}
+
 static netdev_tx_t
 ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
 {
+	/* Warn if the net interface tries to transmit frames while the
+	 * ieee802154 core assumes the queue is stopped.
+	 */
+	WARN_ON_ONCE(ieee802154_queue_is_stopped(local));
+
 	return ieee802154_tx(local, skb);
 }
 
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 5e1fcc7b0123..60eb7bd3bfc1 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -29,6 +29,7 @@ static void ieee802154_wake_queue(struct ieee802154_hw *hw)
 	struct ieee802154_sub_if_data *sdata;
 
 	rcu_read_lock();
+	clear_bit(WPAN_PHY_FLAG_STATE_QUEUE_STOPPED, &local->phy->flags);
 	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
 		if (!sdata->dev)
 			continue;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH wpan-next v4 11/11] net: mac802154: Add a warning in the slow path
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
                   ` (9 preceding siblings ...)
  2022-05-19 15:05 ` [PATCH wpan-next v4 10/11] net: mac802154: Add a warning in the hot path Miquel Raynal
@ 2022-05-19 15:05 ` Miquel Raynal
  2022-06-01  3:30 ` [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Alexander Aring
  11 siblings, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-05-19 15:05 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

In order to be able to detect possible conflicts between the net
interface core and the ieee802154 core, let's add a warning in the slow
path: we want to be sure that whenever we start an asynchronous MLME
transmission (which can be fully asynchronous) the net core somehow
agrees that this transmission is possible, ie. the device was not
stopped. Warning in this case would allow us to track down more easily
possible issues with the MLME logic if we ever get reports.

Unlike in the hot path, such a situation cannot be handled.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/tx.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 6188f42276e7..5b471e932271 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -132,6 +132,25 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
 	return ret;
 }
 
+static bool ieee802154_netif_is_down(struct ieee802154_local *local)
+{
+	struct ieee802154_sub_if_data *sdata;
+	bool is_down = true;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
+		if (!sdata->dev)
+			continue;
+
+		is_down = !netif_running(sdata->dev);
+		if (is_down)
+			break;
+	}
+	rcu_read_unlock();
+
+	return is_down;
+}
+
 int ieee802154_mlme_op_pre(struct ieee802154_local *local)
 {
 	return ieee802154_sync_and_hold_queue(local);
@@ -152,6 +171,14 @@ int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
 		return -ENETDOWN;
 	}
 
+	/* Warn if the ieee802154 core thinks MLME frames can be sent while the
+	 * net interface expects this cannot happen.
+	 */
+	if (WARN_ON_ONCE(ieee802154_netif_is_down(local))) {
+		rtnl_unlock();
+		return -ENETDOWN;
+	}
+
 	ieee802154_tx(local, skb);
 	ret = ieee802154_sync_queue(local);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support
  2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
                   ` (10 preceding siblings ...)
  2022-05-19 15:05 ` [PATCH wpan-next v4 11/11] net: mac802154: Add a warning in the slow path Miquel Raynal
@ 2022-06-01  3:30 ` Alexander Aring
  2022-06-01  6:12   ` Miquel Raynal
  2022-06-01 21:01   ` Stefan Schmidt
  11 siblings, 2 replies; 19+ messages in thread
From: Alexander Aring @ 2022-06-01  3:30 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Alexander Aring, Stefan Schmidt, linux-wpan - ML, David Girault,
	Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Network Development

Hi,

On Thu, May 19, 2022 at 11:06 AM Miquel Raynal
<miquel.raynal@bootlin.com> wrote:
>
> Hello,
>
> This series brings support for that famous synchronous Tx API for MLME
> commands.
>
> MLME commands will be used during scan operations. In this situation,
> we need to be sure that all transfers finished and that no transfer
> will be queued for a short moment.
>

Acked-by: Alexander Aring <aahringo@redhat.com>

There will be now functions upstream which will never be used, Stefan
should wait until they are getting used before sending it to net-next.

- Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support
  2022-06-01  3:30 ` [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Alexander Aring
@ 2022-06-01  6:12   ` Miquel Raynal
  2022-06-01 21:01   ` Stefan Schmidt
  1 sibling, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-06-01  6:12 UTC (permalink / raw)
  To: Alexander Aring
  Cc: Alexander Aring, Stefan Schmidt, linux-wpan - ML, David Girault,
	Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Network Development

Hi Alexander,

aahringo@redhat.com wrote on Tue, 31 May 2022 23:30:25 -0400:

> Hi,
> 
> On Thu, May 19, 2022 at 11:06 AM Miquel Raynal
> <miquel.raynal@bootlin.com> wrote:
> >
> > Hello,
> >
> > This series brings support for that famous synchronous Tx API for MLME
> > commands.
> >
> > MLME commands will be used during scan operations. In this situation,
> > we need to be sure that all transfers finished and that no transfer
> > will be queued for a short moment.
> >  
> 
> Acked-by: Alexander Aring <aahringo@redhat.com>
> 
> There will be now functions upstream which will never be used, Stefan
> should wait until they are getting used before sending it to net-next.

That's right.

Thanks for all the feedback so far!
Miquèl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support
  2022-06-01  3:30 ` [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Alexander Aring
  2022-06-01  6:12   ` Miquel Raynal
@ 2022-06-01 21:01   ` Stefan Schmidt
  2022-06-03 17:55     ` Miquel Raynal
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Schmidt @ 2022-06-01 21:01 UTC (permalink / raw)
  To: Alexander Aring, Miquel Raynal
  Cc: Alexander Aring, linux-wpan - ML, David Girault, Romuald Despres,
	Frederic Blain, Nicolas Schodet, Thomas Petazzoni,
	David S. Miller, Jakub Kicinski, Paolo Abeni,
	Network Development


Hello.

On 01.06.22 05:30, Alexander Aring wrote:
> Hi,
> 
> On Thu, May 19, 2022 at 11:06 AM Miquel Raynal
> <miquel.raynal@bootlin.com> wrote:
>>
>> Hello,
>>
>> This series brings support for that famous synchronous Tx API for MLME
>> commands.
>>
>> MLME commands will be used during scan operations. In this situation,
>> we need to be sure that all transfers finished and that no transfer
>> will be queued for a short moment.
>>
> 
> Acked-by: Alexander Aring <aahringo@redhat.com>

These patches have been applied to the wpan-next tree. Thanks!

> There will be now functions upstream which will never be used, Stefan
> should wait until they are getting used before sending it to net-next.

Indeed this can wait until we have a consumer of the functions before 
pushing this forward to net-next. Pretty sure Miquel is happy to finally 
move on to other pieces of his puzzle and use them. :-)

regards
Stefan Schmidt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support
  2022-06-01 21:01   ` Stefan Schmidt
@ 2022-06-03 17:55     ` Miquel Raynal
  2022-06-04  1:50       ` Alexander Aring
  0 siblings, 1 reply; 19+ messages in thread
From: Miquel Raynal @ 2022-06-03 17:55 UTC (permalink / raw)
  To: Stefan Schmidt
  Cc: Alexander Aring, Alexander Aring, linux-wpan - ML, David Girault,
	Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Network Development

Hi Stefan, Alex,

stefan@datenfreihafen.org wrote on Wed, 1 Jun 2022 23:01:51 +0200:

> Hello.
> 
> On 01.06.22 05:30, Alexander Aring wrote:
> > Hi,
> > 
> > On Thu, May 19, 2022 at 11:06 AM Miquel Raynal
> > <miquel.raynal@bootlin.com> wrote:  
> >>
> >> Hello,
> >>
> >> This series brings support for that famous synchronous Tx API for MLME
> >> commands.
> >>
> >> MLME commands will be used during scan operations. In this situation,
> >> we need to be sure that all transfers finished and that no transfer
> >> will be queued for a short moment.
> >>  
> > 
> > Acked-by: Alexander Aring <aahringo@redhat.com>  
> 
> These patches have been applied to the wpan-next tree. Thanks!
> 
> > There will be now functions upstream which will never be used, Stefan
> > should wait until they are getting used before sending it to net-next.  
> 
> Indeed this can wait until we have a consumer of the functions before pushing this forward to net-next. Pretty sure Miquel is happy to finally move on to other pieces of his puzzle and use them. :-)

Next part is coming!

In the mean time I've experienced a new lockdep warning:

All the netlink commands are executed with the rtnl taken.
In my current implementation, when I configure/edit a scan request or a
beacon request I take a scan_lock or a beacons_lock, so they may only
be taken after the rtnl in this case, which leads to this sequence of
events:
- the rtnl is taken (by the net core)
- the beacon's lock is taken

But now in a beacon's work or an active scan work, what happens is:
- work gets woken up
- the beacon/scan lock is taken
- a beacon/beacon-request frame is transmitted
- the rtnl lock is taken during this transmission

Lockdep then detects a possible circular dependency:
[  490.153387]        CPU0                    CPU1
[  490.153391]        ----                    ----
[  490.153394]   lock(&local->beacons_lock);
[  490.153400]                                lock(rtnl_mutex);
[  490.153406]                                lock(&local->beacons_lock);
[  490.153412]   lock(rtnl_mutex);

So in practice, I always need to have the rtnl lock taken when
acquiring these other locks (beacon/scan_lock) which I think is far
from optimal.

1# One solution is to drop the beacons/scan locks because they are not
useful anymore and simply rely on the rtnl.

2# Another solution would be to change the mlme_tx() implementation to
finally not need the rtnl at all.

Note that just calling ASSERT_RTNL() makes no difference in 2#, it
still means that I always need to acquire the rtnl before acquiring the
beacons/scan locks, which greatly reduces their usefulness and leads to
solution 1# in the end.

IIRC I decided to introduce the rtnl to avoid ->ndo_stop() calls during
an MLME transmission. I don't know if it has another use there. If not,
we may perhaps get rid of the rtnl in mlme_tx() by really handling the
stop calls (but I was too lazy so far to do that).

What direction would you advise?

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support
  2022-06-03 17:55     ` Miquel Raynal
@ 2022-06-04  1:50       ` Alexander Aring
  2022-06-06 17:03         ` Miquel Raynal
  2022-06-17 14:20         ` Miquel Raynal
  0 siblings, 2 replies; 19+ messages in thread
From: Alexander Aring @ 2022-06-04  1:50 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Stefan Schmidt, Alexander Aring, linux-wpan - ML, David Girault,
	Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Network Development

Hi,

On Fri, Jun 3, 2022 at 1:55 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Stefan, Alex,
>
> stefan@datenfreihafen.org wrote on Wed, 1 Jun 2022 23:01:51 +0200:
>
> > Hello.
> >
> > On 01.06.22 05:30, Alexander Aring wrote:
> > > Hi,
> > >
> > > On Thu, May 19, 2022 at 11:06 AM Miquel Raynal
> > > <miquel.raynal@bootlin.com> wrote:
> > >>
> > >> Hello,
> > >>
> > >> This series brings support for that famous synchronous Tx API for MLME
> > >> commands.
> > >>
> > >> MLME commands will be used during scan operations. In this situation,
> > >> we need to be sure that all transfers finished and that no transfer
> > >> will be queued for a short moment.
> > >>
> > >
> > > Acked-by: Alexander Aring <aahringo@redhat.com>
> >
> > These patches have been applied to the wpan-next tree. Thanks!
> >
> > > There will be now functions upstream which will never be used, Stefan
> > > should wait until they are getting used before sending it to net-next.
> >
> > Indeed this can wait until we have a consumer of the functions before pushing this forward to net-next. Pretty sure Miquel is happy to finally move on to other pieces of his puzzle and use them. :-)
>
> Next part is coming!
>
> In the mean time I've experienced a new lockdep warning:
>
> All the netlink commands are executed with the rtnl taken.
> In my current implementation, when I configure/edit a scan request or a
> beacon request I take a scan_lock or a beacons_lock, so they may only
> be taken after the rtnl in this case, which leads to this sequence of
> events:
> - the rtnl is taken (by the net core)
> - the beacon's lock is taken
>
> But now in a beacon's work or an active scan work, what happens is:
> - work gets woken up
> - the beacon/scan lock is taken
> - a beacon/beacon-request frame is transmitted
> - the rtnl lock is taken during this transmission
>
> Lockdep then detects a possible circular dependency:
> [  490.153387]        CPU0                    CPU1
> [  490.153391]        ----                    ----
> [  490.153394]   lock(&local->beacons_lock);
> [  490.153400]                                lock(rtnl_mutex);
> [  490.153406]                                lock(&local->beacons_lock);
> [  490.153412]   lock(rtnl_mutex);
>
> So in practice, I always need to have the rtnl lock taken when
> acquiring these other locks (beacon/scan_lock) which I think is far
> from optimal.
>

*Note that those can also be false positives.

> 1# One solution is to drop the beacons/scan locks because they are not
> useful anymore and simply rely on the rtnl.
>

depends on how long it will be held.

> 2# Another solution would be to change the mlme_tx() implementation to
> finally not need the rtnl at all.
>
> Note that just calling ASSERT_RTNL() makes no difference in 2#, it
> still means that I always need to acquire the rtnl before acquiring the
> beacons/scan locks, which greatly reduces their usefulness and leads to
> solution 1# in the end.
>
> IIRC I decided to introduce the rtnl to avoid ->ndo_stop() calls during
> an MLME transmission. I don't know if it has another use there. If not,
> we may perhaps get rid of the rtnl in mlme_tx() by really handling the
> stop calls (but I was too lazy so far to do that).
>
> What direction would you advise?

Hard to say without code. Please show us some code of the current
state... there should also be some stacktrace of the circular lock
dependency, please provide the full output _matching_ the provided
code.

Thanks.

- Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support
  2022-06-04  1:50       ` Alexander Aring
@ 2022-06-06 17:03         ` Miquel Raynal
  2022-06-17 14:20         ` Miquel Raynal
  1 sibling, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-06-06 17:03 UTC (permalink / raw)
  To: Alexander Aring
  Cc: Stefan Schmidt, Alexander Aring, linux-wpan - ML, David Girault,
	Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Network Development

Hi Alex,

aahringo@redhat.com wrote on Fri, 3 Jun 2022 21:50:15 -0400:

> Hi,
> 
> On Fri, Jun 3, 2022 at 1:55 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Stefan, Alex,
> >
> > stefan@datenfreihafen.org wrote on Wed, 1 Jun 2022 23:01:51 +0200:
> >  
> > > Hello.
> > >
> > > On 01.06.22 05:30, Alexander Aring wrote:  
> > > > Hi,
> > > >
> > > > On Thu, May 19, 2022 at 11:06 AM Miquel Raynal
> > > > <miquel.raynal@bootlin.com> wrote:  
> > > >>
> > > >> Hello,
> > > >>
> > > >> This series brings support for that famous synchronous Tx API for MLME
> > > >> commands.
> > > >>
> > > >> MLME commands will be used during scan operations. In this situation,
> > > >> we need to be sure that all transfers finished and that no transfer
> > > >> will be queued for a short moment.
> > > >>  
> > > >
> > > > Acked-by: Alexander Aring <aahringo@redhat.com>  
> > >
> > > These patches have been applied to the wpan-next tree. Thanks!
> > >  
> > > > There will be now functions upstream which will never be used, Stefan
> > > > should wait until they are getting used before sending it to net-next.  
> > >
> > > Indeed this can wait until we have a consumer of the functions before pushing this forward to net-next. Pretty sure Miquel is happy to finally move on to other pieces of his puzzle and use them. :-)  
> >
> > Next part is coming!
> >
> > In the mean time I've experienced a new lockdep warning:
> >
> > All the netlink commands are executed with the rtnl taken.
> > In my current implementation, when I configure/edit a scan request or a
> > beacon request I take a scan_lock or a beacons_lock, so they may only
> > be taken after the rtnl in this case, which leads to this sequence of
> > events:
> > - the rtnl is taken (by the net core)
> > - the beacon's lock is taken
> >
> > But now in a beacon's work or an active scan work, what happens is:
> > - work gets woken up
> > - the beacon/scan lock is taken
> > - a beacon/beacon-request frame is transmitted
> > - the rtnl lock is taken during this transmission
> >
> > Lockdep then detects a possible circular dependency:
> > [  490.153387]        CPU0                    CPU1
> > [  490.153391]        ----                    ----
> > [  490.153394]   lock(&local->beacons_lock);
> > [  490.153400]                                lock(rtnl_mutex);
> > [  490.153406]                                lock(&local->beacons_lock);
> > [  490.153412]   lock(rtnl_mutex);
> >
> > So in practice, I always need to have the rtnl lock taken when
> > acquiring these other locks (beacon/scan_lock) which I think is far
> > from optimal.
> >  
> 
> *Note that those can also be false positives.
> 
> > 1# One solution is to drop the beacons/scan locks because they are not
> > useful anymore and simply rely on the rtnl.
> >  
> 
> depends on how long it will be held.
> 
> > 2# Another solution would be to change the mlme_tx() implementation to
> > finally not need the rtnl at all.
> >
> > Note that just calling ASSERT_RTNL() makes no difference in 2#, it
> > still means that I always need to acquire the rtnl before acquiring the
> > beacons/scan locks, which greatly reduces their usefulness and leads to
> > solution 1# in the end.
> >
> > IIRC I decided to introduce the rtnl to avoid ->ndo_stop() calls during
> > an MLME transmission. I don't know if it has another use there. If not,
> > we may perhaps get rid of the rtnl in mlme_tx() by really handling the
> > stop calls (but I was too lazy so far to do that).
> >
> > What direction would you advise?  
> 
> Hard to say without code. Please show us some code of the current
> state... there should also be some stacktrace of the circular lock
> dependency, please provide the full output _matching_ the provided
> code.

Of course, here is the branch that I used to produce the warning:
https://github.com/miquelraynal/linux/ branch wpan-next/scan

Triggering this is just a matter or executing nl802154_send_beacons().
And here is the trace which appears in the dmesg:

[  234.224911] mac802154_hwsim mac802154_hwsim: Added 2 mac802154 hwsim hardware radios
[  257.846221] Sending beacon

[  257.847439] ======================================================
[  257.847446] WARNING: possible circular locking dependency detected
[  257.847463] 5.18.0-rc4-uwb+ #217 Not tainted
[  257.847473] ------------------------------------------------------
[  257.847479] kworker/u4:4/53 is trying to acquire lock:
[  257.847488] ffffffff9d049d48 (rtnl_mutex){+.+.}-{3:3}, at: ieee802154_mlme_tx+0xf/0x160 [mac802154]
[  257.847577] 
               but task is already holding lock:
[  257.847584] ffff89b082ea7ae0 (&local->beacons_lock){+.+.}-{3:3}, at: mac802154_beacons_work+0x1d/0xb0 [mac802154]
[  257.847651] 
               which lock already depends on the new lock.

[  257.847668] 
               the existing dependency chain (in reverse order) is:
[  257.847674] 
               -> #1 (&local->beacons_lock){+.+.}-{3:3}:
[  257.847702]        __mutex_lock+0x9d/0x9a0
[  257.847719]        mac802154_send_beacons+0x32/0x80 [mac802154]
[  257.847767]        nl802154_send_beacons+0xd7/0x1f0 [ieee802154]
[  257.847829]        genl_family_rcv_msg_doit+0xe5/0x140
[  257.847842]        genl_rcv_msg+0xd7/0x1e0
[  257.847852]        netlink_rcv_skb+0x4c/0xf0
[  257.847862]        genl_rcv+0x1f/0x30
[  257.847871]        netlink_unicast+0x191/0x260
[  257.847882]        netlink_sendmsg+0x22e/0x480
[  257.847892]        sock_sendmsg+0x59/0x60
[  257.847907]        ____sys_sendmsg+0x20c/0x260
[  257.847922]        ___sys_sendmsg+0x7c/0xc0
[  257.847932]        __sys_sendmsg+0x54/0xa0
[  257.847942]        do_syscall_64+0x3b/0x90
[  257.847956]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  257.847972] 
               -> #0 (rtnl_mutex){+.+.}-{3:3}:
[  257.847989]        __lock_acquire+0x1253/0x22e0
[  257.848002]        lock_acquire+0xca/0x2f0
[  257.848011]        __mutex_lock+0x9d/0x9a0
[  257.848023]        ieee802154_mlme_tx+0xf/0x160 [mac802154]
[  257.848058]        ieee802154_mlme_tx_one+0x2d/0x40 [mac802154]
[  257.848092]        mac802154_beacons_work.cold+0x100/0x110 [mac802154]
[  257.848135]        process_one_work+0x26f/0x5a0
[  257.848147]        worker_thread+0x4a/0x3d0
[  257.848158]        kthread+0xee/0x120
[  257.848168]        ret_from_fork+0x22/0x30
[  257.848180] 
               other info that might help us debug this:

[  257.848187]  Possible unsafe locking scenario:

[  257.848192]        CPU0                    CPU1
[  257.848198]        ----                    ----
[  257.848203]   lock(&local->beacons_lock);
[  257.848215]                                lock(rtnl_mutex);
[  257.848226]                                lock(&local->beacons_lock);
[  257.848236]   lock(rtnl_mutex);
[  257.848246] 
                *** DEADLOCK ***

[  257.848252] 3 locks held by kworker/u4:4/53:
[  257.848262]  #0: ffff89b0b66b4138 ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0x1ef/0x5a0
[  257.848290]  #1: ffffa021404e3e78 ((work_completion)(&(&local->beacons_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1ef/0x5a0
[  257.848317]  #2: ffff89b082ea7ae0 (&local->beacons_lock){+.+.}-{3:3}, at: mac802154_beacons_work+0x1d/0xb0 [mac802154]
[  257.848371] 
               stack backtrace:
[  257.848388] CPU: 1 PID: 53 Comm: kworker/u4:4 Not tainted 5.18.0-rc4-uwb+ #217
[  257.848404] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[  257.848422] Workqueue: phy0 mac802154_beacons_work [mac802154]
[  257.848472] Call Trace:
[  257.848490]  <TASK>
[  257.848507]  dump_stack_lvl+0x45/0x59
[  257.848536]  check_noncircular+0xfe/0x110
[  257.848559]  __lock_acquire+0x1253/0x22e0
[  257.848581]  lock_acquire+0xca/0x2f0
[  257.848592]  ? ieee802154_mlme_tx+0xf/0x160 [mac802154]
[  257.848638]  __mutex_lock+0x9d/0x9a0
[  257.848652]  ? ieee802154_mlme_tx+0xf/0x160 [mac802154]
[  257.848690]  ? mark_held_locks+0x49/0x70
[  257.848701]  ? ieee802154_mlme_tx+0xf/0x160 [mac802154]
[  257.848737]  ? _raw_spin_unlock_irqrestore+0x28/0x50
[  257.848753]  ? lockdep_hardirqs_on+0x79/0x100
[  257.848770]  ? ieee802154_mlme_tx+0xf/0x160 [mac802154]
[  257.848804]  ieee802154_mlme_tx+0xf/0x160 [mac802154]
[  257.848841]  ieee802154_mlme_tx_one+0x2d/0x40 [mac802154]
[  257.848878]  mac802154_beacons_work.cold+0x100/0x110 [mac802154]
[  257.848924]  process_one_work+0x26f/0x5a0
[  257.848944]  worker_thread+0x4a/0x3d0
[  257.848959]  ? process_one_work+0x5a0/0x5a0
[  257.848971]  kthread+0xee/0x120
[  257.848981]  ? kthread_complete_and_exit+0x20/0x20
[  257.848995]  ret_from_fork+0x22/0x30
[  257.849022]  </TASK>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support
  2022-06-04  1:50       ` Alexander Aring
  2022-06-06 17:03         ` Miquel Raynal
@ 2022-06-17 14:20         ` Miquel Raynal
  1 sibling, 0 replies; 19+ messages in thread
From: Miquel Raynal @ 2022-06-17 14:20 UTC (permalink / raw)
  To: Alexander Aring
  Cc: Stefan Schmidt, Alexander Aring, linux-wpan - ML, David Girault,
	Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Network Development

Hi Alex,

aahringo@redhat.com wrote on Fri, 3 Jun 2022 21:50:15 -0400:

> Hi,
> 
> On Fri, Jun 3, 2022 at 1:55 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Stefan, Alex,
> >
> > stefan@datenfreihafen.org wrote on Wed, 1 Jun 2022 23:01:51 +0200:
> >  
> > > Hello.
> > >
> > > On 01.06.22 05:30, Alexander Aring wrote:  
> > > > Hi,
> > > >
> > > > On Thu, May 19, 2022 at 11:06 AM Miquel Raynal
> > > > <miquel.raynal@bootlin.com> wrote:  
> > > >>
> > > >> Hello,
> > > >>
> > > >> This series brings support for that famous synchronous Tx API for MLME
> > > >> commands.
> > > >>
> > > >> MLME commands will be used during scan operations. In this situation,
> > > >> we need to be sure that all transfers finished and that no transfer
> > > >> will be queued for a short moment.
> > > >>  
> > > >
> > > > Acked-by: Alexander Aring <aahringo@redhat.com>  
> > >
> > > These patches have been applied to the wpan-next tree. Thanks!
> > >  
> > > > There will be now functions upstream which will never be used, Stefan
> > > > should wait until they are getting used before sending it to net-next.  
> > >
> > > Indeed this can wait until we have a consumer of the functions before pushing this forward to net-next. Pretty sure Miquel is happy to finally move on to other pieces of his puzzle and use them. :-)  
> >
> > Next part is coming!
> >
> > In the mean time I've experienced a new lockdep warning:
> >
> > All the netlink commands are executed with the rtnl taken.
> > In my current implementation, when I configure/edit a scan request or a
> > beacon request I take a scan_lock or a beacons_lock, so they may only
> > be taken after the rtnl in this case, which leads to this sequence of
> > events:
> > - the rtnl is taken (by the net core)
> > - the beacon's lock is taken
> >
> > But now in a beacon's work or an active scan work, what happens is:
> > - work gets woken up
> > - the beacon/scan lock is taken
> > - a beacon/beacon-request frame is transmitted
> > - the rtnl lock is taken during this transmission
> >
> > Lockdep then detects a possible circular dependency:
> > [  490.153387]        CPU0                    CPU1
> > [  490.153391]        ----                    ----
> > [  490.153394]   lock(&local->beacons_lock);
> > [  490.153400]                                lock(rtnl_mutex);
> > [  490.153406]                                lock(&local->beacons_lock);
> > [  490.153412]   lock(rtnl_mutex);

So after a lot of thinking and different tries, I've opted for a
slightly different approach regarding the rtnl being taken in the mlme
tx path. What we want there is actually to be sure that the device
won't be turned off during the transmission. Either this is done
before, and the transmission will just return an error (and this is
fine) or there is no ndo_close() call and we are actually safe. So I've
actually introduced a mutex for serializing accesses to the "stop the
device" section which actually what we care about. It works well, avoid
keeping the rtnl in all the scan/beacons works (which would have been
a crazy thing to do IMHO) and allows to keep a beacons/scan mutex for
the configuration of these specific parts. I'll propose this change in
the upcoming series.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-06-17 14:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 15:05 [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 01/11] net: mac802154: Rename the synchronous xmit worker Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 02/11] net: mac802154: Rename the main tx_work struct Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 03/11] net: mac802154: Enhance the error path in the main tx helper Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 04/11] net: mac802154: Follow the count of ongoing transmissions Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 05/11] net: mac802154: Bring the ability to hold the transmit queue Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 06/11] net: mac802154: Create a hot tx path Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 07/11] net: mac802154: Introduce a helper to disable the queue Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 08/11] net: mac802154: Introduce a tx queue flushing mechanism Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 09/11] net: mac802154: Introduce a synchronous API for MLME commands Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 10/11] net: mac802154: Add a warning in the hot path Miquel Raynal
2022-05-19 15:05 ` [PATCH wpan-next v4 11/11] net: mac802154: Add a warning in the slow path Miquel Raynal
2022-06-01  3:30 ` [PATCH wpan-next v4 00/11] ieee802154: Synchronous Tx support Alexander Aring
2022-06-01  6:12   ` Miquel Raynal
2022-06-01 21:01   ` Stefan Schmidt
2022-06-03 17:55     ` Miquel Raynal
2022-06-04  1:50       ` Alexander Aring
2022-06-06 17:03         ` Miquel Raynal
2022-06-17 14:20         ` Miquel Raynal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).