linux-wpan.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@datenfreihafen.org>,
	linux-wpan - ML <linux-wpan@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	"open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>,
	David Girault <david.girault@qorvo.com>,
	Romuald Despres <romuald.despres@qorvo.com>,
	Frederic Blain <frederic.blain@qorvo.com>,
	Nicolas Schodet <nico@ni.fr.eu.org>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Subject: Re: [PATCH wpan-next v2 13/14] net: mac802154: Introduce a tx queue flushing mechanism
Date: Fri, 4 Mar 2022 11:54:32 +0100	[thread overview]
Message-ID: <20220304115432.7913f2ef@xps13> (raw)
In-Reply-To: <20220303191723.39b87766@xps13>

Hi Alexander,

miquel.raynal@bootlin.com wrote on Thu, 3 Mar 2022 19:17:23 +0100:

> Hi Alexander,
> 
> alex.aring@gmail.com wrote on Sun, 20 Feb 2022 18:49:06 -0500:
> 
> > Hi,
> > 
> > On Mon, Feb 7, 2022 at 9:48 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > >
> > > Right now we are able to stop a queue but we have no indication if a
> > > transmission is ongoing or not.
> > >
> > > Thanks to recent additions, we can track the number of ongoing
> > > transmissions so we know if the last transmission is over. Adding on top
> > > of it an internal wait queue also allows to be woken up asynchronously
> > > when this happens. If, beforehands, we marked the queue to be held and
> > > stopped it, we end up flushing and stopping the tx queue.
> > >
> > > Thanks to this feature, we will soon be able to introduce a synchronous
> > > transmit API.
> > >
> > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > ---
> > >  include/net/cfg802154.h      |  1 +
> > >  net/ieee802154/core.c        |  1 +
> > >  net/mac802154/cfg.c          |  5 ++---
> > >  net/mac802154/ieee802154_i.h |  1 +
> > >  net/mac802154/tx.c           | 11 ++++++++++-
> > >  net/mac802154/util.c         |  3 ++-
> > >  6 files changed, 17 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
> > > index 043d8e4359e7..0d385a214da3 100644
> > > --- a/include/net/cfg802154.h
> > > +++ b/include/net/cfg802154.h
> > > @@ -217,6 +217,7 @@ struct wpan_phy {
> > >         /* Transmission monitoring and control */
> > >         atomic_t ongoing_txs;
> > >         atomic_t hold_txs;
> > > +       wait_queue_head_t sync_txq;
> > >
> > >         char priv[] __aligned(NETDEV_ALIGN);
> > >  };
> > > diff --git a/net/ieee802154/core.c b/net/ieee802154/core.c
> > > index de259b5170ab..0953cacafbff 100644
> > > --- a/net/ieee802154/core.c
> > > +++ b/net/ieee802154/core.c
> > > @@ -129,6 +129,7 @@ wpan_phy_new(const struct cfg802154_ops *ops, size_t priv_size)
> > >         wpan_phy_net_set(&rdev->wpan_phy, &init_net);
> > >
> > >         init_waitqueue_head(&rdev->dev_wait);
> > > +       init_waitqueue_head(&rdev->wpan_phy.sync_txq);
> > >
> > >         return &rdev->wpan_phy;
> > >  }
> > > diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c
> > > index e8aabf215286..da94aaa32fcb 100644
> > > --- a/net/mac802154/cfg.c
> > > +++ b/net/mac802154/cfg.c
> > > @@ -46,8 +46,7 @@ static int ieee802154_suspend(struct wpan_phy *wpan_phy)
> > >         if (!local->open_count)
> > >                 goto suspend;
> > >
> > > -       atomic_inc(&wpan_phy->hold_txs);
> > > -       ieee802154_stop_queue(&local->hw);
> > > +       ieee802154_sync_and_stop_tx(local);
> > >         synchronize_net();
> > >
> > >         /* stop hardware - this must stop RX */
> > > @@ -73,7 +72,7 @@ static int ieee802154_resume(struct wpan_phy *wpan_phy)
> > >                 return ret;
> > >
> > >  wake_up:
> > > -       if (!atomic_dec_and_test(&wpan_phy->hold_txs))
> > > +       if (!atomic_read(&wpan_phy->hold_txs))
> > >                 ieee802154_wake_queue(&local->hw);
> > >         local->suspended = false;
> > >         return 0;
> > > diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
> > > index 56fcd7ef5b6f..295c9ce091e1 100644
> > > --- a/net/mac802154/ieee802154_i.h
> > > +++ b/net/mac802154/ieee802154_i.h
> > > @@ -122,6 +122,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
> > >
> > >  void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
> > >  void ieee802154_xmit_sync_worker(struct work_struct *work);
> > > +void ieee802154_sync_and_stop_tx(struct ieee802154_local *local);
> > >  netdev_tx_t
> > >  ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> > >  netdev_tx_t
> > > diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> > > index abd9a057521e..06ae2e6cea43 100644
> > > --- a/net/mac802154/tx.c
> > > +++ b/net/mac802154/tx.c
> > > @@ -47,7 +47,8 @@ void ieee802154_xmit_sync_worker(struct work_struct *work)
> > >                 ieee802154_wake_queue(&local->hw);
> > >
> > >         kfree_skb(skb);
> > > -       atomic_dec(&local->phy->ongoing_txs);
> > > +       if (!atomic_dec_and_test(&local->phy->ongoing_txs))
> > > +               wake_up(&local->phy->sync_txq);
> > >         netdev_dbg(dev, "transmission failed\n");
> > >  }
> > >
> > > @@ -117,6 +118,14 @@ ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
> > >         return ieee802154_tx(local, skb);
> > >  }
> > >
> > > +void ieee802154_sync_and_stop_tx(struct ieee802154_local *local)
> > > +{
> > > +       atomic_inc(&local->phy->hold_txs);
> > > +       ieee802154_stop_queue(&local->hw);
> > > +       wait_event(local->phy->sync_txq, !atomic_read(&local->phy->ongoing_txs));
> > > +       atomic_dec(&local->phy->hold_txs);    
> > 
> > In my opinion this _still_ races as I mentioned earlier. You need to
> > be sure that if you do ieee802154_stop_queue() that no ieee802154_tx()
> > or hot_tx() is running at this time. Look into the function I
> > mentioned earlier "?netif_tx_disable()?".  
> 
> I think now I get the problem, but I am having troubles understanding
> the logic in netif_tx_disable(), or should I say, the idea that I
> should adapt to our situation.
> 
> I understand that we should make sure the following situation does not
> happen:
> - ieee802154_subif_start_xmit() is called
> - ieee802154_subif_start_xmit() is called again
> - ieee802154_tx() get's executed once and stops the queue
> - ongoing_txs gets incremented once
> - the first transfer finishes and ongoing_txs gets decremented
> - the tx queue is supposedly empty by the current series while
>   the second transfer requested earlier has not yet been processed and
>   will definitely be tried in a short moment.
> 
> I don't find a pretty solution for that. Is your suggestion to use the
> netdev tx_global_lock? If yes, then, how? Because it does not appear
> clear to me how we should tackle this issue.

I had a second look at it and it appears to me that the issue was
already there and is structural. We just did not really cared about it
because we didn't bother with synchronization issues.

Here is a figure to base our discussions on:

                       enable
                         ┌────────────────────────────────────────────────────────────┐
                         │                                                            │
                         ▼                                                            │
          packet     ┌────────┐   ┌────────────┐   ┌────────────┐   ┌───────┐   ┌─────┴─────┐
            ┌┐       │        │   │            │   │            │   │       │   │           │
User  ──────┴┴──────►│ Queue  ├──►│ ieee*_tx() ├──►│stop_queue()├──►│xmit() ├──►│ wait/sync │
                     │        │   │            │   │            │   │       │   │           │
                     └────────┘   └────────────┘   └─────┬──────┘   └───────┘   └───────────┘
                         ▲                               │
                         │                               │
                         │                               │
                         └───────────────────────────────┘
                      disable

I assumed that we don't have the hand on the queuing mechanism (on the
left of the 'queue' box). I looked at the core code under
net/ieee802154/ and even if incrementing a counter there would be
handy, I assumed this was not an acceptable solution.

So then we end up with the possible situation where there are two (or
more) packets that must be processed by the mac tx logic (at the right
of the 'queue' box). The problem is of course the atomicity of the
stop_queue() compared to the number of times the ieee802154_tx()
function call can be made. We can have several packets being processed,
we don't have any way to know that.

Moving the stop_queue earlier would just reduce the racy area, without
fully preventing it, so not a solution per-se.

Perhaps we could monitor the state of the queue, it would help us know
if we need to retain a packet, but I personally find this a bit crappy,
yet probably working. Here is a drafted implementation, I'm only half
convinced this is a good idea and your input is welcome here:

--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -77,14 +77,26 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
                put_unaligned_le16(crc, skb_put(skb, 2));
        }
 
+retry:
+       while (netif_queue_stopped())
+               schedule();
+
+       acquire_lock();
+       if (netif_queue_stopped()) {
+               release_lock();
+               goto retry;
+       }
+
        /* Stop the netif queue on each sub_if_data object. */
        ieee802154_stop_queue(&local->hw);
 
+       atomic_inc(&local->phy->ongoing_txs);
+       release_lock();
+
        /* Drivers should preferably implement the async callback. In some rare
         * cases they only provide a sync callback which we will use as a
         * fallback.
         */
        if (local->ops->xmit_async) {
                unsigned int len = skb->len;
 
@@ -122,8 +134,10 @@ int ieee802154_sync_and_stop_tx(struct ieee802154_local *local)
 {
        int ret;
 
+       acquire_lock();
        atomic_inc(&local->phy->hold_txs);
        ieee802154_stop_queue(&local->hw);
+       release_lock();
        wait_event(local->phy->sync_txq, !atomic_read(&local->phy->ongoing_txs));
        ret = local->tx_result;
        atomic_dec(&local->phy->hold_txs);


If we go in this direction, perhaps it's best to rework the sync API
like you already proposed: just stopping the queue and syncing the
ongoing transfers, so that after that we can use a dedicated tx path
for MLME commands, bypassing the queue-is-stopped check. This way we
avoid risking to deliver data packets between two MLME calls.

Thanks,
Miquèl

  reply	other threads:[~2022-03-04 10:54 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-07 14:47 [PATCH wpan-next v2 00/14] ieee802154: Synchronous Tx API Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 01/14] net: ieee802154: Move the logic restarting the queue upon transmission Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 02/14] net: mac802154: Create a transmit error helper Miquel Raynal
2022-02-20 23:31   ` Alexander Aring
2022-02-21 20:22     ` Alexander Aring
2022-02-22  8:43       ` Miquel Raynal
2022-02-24  1:53         ` Alexander Aring
2022-02-07 14:47 ` [PATCH wpan-next v2 03/14] net: ieee802154: at86rf230: Call _xmit_error() when a transmission fails Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 04/14] net: ieee802154: atusb: " Miquel Raynal
2022-02-20 23:35   ` Alexander Aring
2022-02-24  2:00     ` Alexander Aring
2022-02-24 14:43       ` Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 05/14] net: ieee802154: ca8210: " Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 06/14] net: mac802154: Stop exporting ieee802154_wake/stop_queue() Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 07/14] net: mac802154: Rename the synchronous xmit worker Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 08/14] net: mac802154: Rename the main tx_work struct Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 09/14] net: mac802154: Follow the count of ongoing transmissions Miquel Raynal
2022-02-07 14:48 ` [PATCH wpan-next v2 10/14] net: mac802154: Hold the transmit queue when relevant Miquel Raynal
2022-02-07 14:48 ` [PATCH wpan-next v2 11/14] net: mac802154: Create a hot tx path Miquel Raynal
2022-02-07 14:48 ` [PATCH wpan-next v2 12/14] net: mac802154: Add a warning in the hot path Miquel Raynal
2022-02-07 14:48 ` [PATCH wpan-next v2 13/14] net: mac802154: Introduce a tx queue flushing mechanism Miquel Raynal
2022-02-20 23:49   ` Alexander Aring
2022-03-03 18:17     ` Miquel Raynal
2022-03-04 10:54       ` Miquel Raynal [this message]
2022-03-13 20:43         ` Alexander Aring
2022-03-18 18:11           ` Miquel Raynal
2022-03-27 16:45             ` Alexander Aring
2022-03-29 16:29               ` Miquel Raynal
2022-02-07 14:48 ` [PATCH wpan-next v2 14/14] net: mac802154: Introduce a synchronous API for MLME commands Miquel Raynal
2022-02-20 23:52   ` Alexander Aring
2022-02-21 20:33     ` Alexander Aring
2022-02-21 20:33       ` Alexander Aring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220304115432.7913f2ef@xps13 \
    --to=miquel.raynal@bootlin.com \
    --cc=alex.aring@gmail.com \
    --cc=davem@davemloft.net \
    --cc=david.girault@qorvo.com \
    --cc=frederic.blain@qorvo.com \
    --cc=kuba@kernel.org \
    --cc=linux-wpan@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nico@ni.fr.eu.org \
    --cc=romuald.despres@qorvo.com \
    --cc=stefan@datenfreihafen.org \
    --cc=thomas.petazzoni@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).