linux-wpan.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Alexander Aring <aahringo@redhat.com>
Cc: Alexander Aring <alex.aring@gmail.com>,
	Stefan Schmidt <stefan@datenfreihafen.org>,
	linux-wpan - ML <linux-wpan@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>,
	Network Development <netdev@vger.kernel.org>,
	David Girault <david.girault@qorvo.com>,
	Romuald Despres <romuald.despres@qorvo.com>,
	Frederic Blain <frederic.blain@qorvo.com>,
	Nicolas Schodet <nico@ni.fr.eu.org>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Subject: Re: [PATCH wpan-next 09/20] net: mac802154: Introduce a global device lock
Date: Fri, 19 Aug 2022 19:06:18 +0200	[thread overview]
Message-ID: <20220819190618.4647849f@xps-13> (raw)
In-Reply-To: <CAK-6q+hu4YGfU9V5EkRiT+Z8MJhOEeVsVv=vEz5fHPkDL99=TQ@mail.gmail.com>

Hi Alexander,

I hope you've had a wonderful summer :-)

aahringo@redhat.com wrote on Sun, 3 Jul 2022 21:12:43 -0400:

> Hi,
> 
> On Fri, Jul 1, 2022 at 10:36 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > The purpose of this device lock is to prevent the removal of the device
> > while an asynchronous MLME operation happens. The RTNL works well for
> > that but in a later series having the RTNL taken here will be
> > problematic and will cause lockdep to warn us about a circular
> > dependency. We don't really need the RTNL here, just a serialization
> > over this operation.
> >
> > Replace the RTNL calls with this new lock.  
> 
> I am unhappy about this solution. Can we not interrupt the ongoing
> operation "scan" here and come to an end at a stop?
> 
> The RTNL is NOT only to prevent the removal of something... If mostly
> all operations are protected by and I know one which makes trouble
> here... setting page/channel. I know we don't hold the rtnl lock on
> other transmit functionality for phy settings which has other reasons
> why we allow it... but here we offer a mac operation which delivers
> wrong results if somebody does another setting e.g. set page/channel
> while scan is going on and we should prevent this.
> 
> Dropping the rtnl lock, yes we can do that... I cannot think about all
> the side effects which this change will bring into, one I know is the
> channel setting, mostly everything that is interfering with a scan and
> then ugly things which we don't want... preparing the code for the
> page/channel gives us a direction on how to fix and check the other
> cases if we find them. btw: we should do this on another approach
> anyway because the rtnl lock is not held during a whole operation and
> we don't want that.
> 
> We should also take care that we hold some references which we held
> during the scan, even if it's protected by stop (just for
> correctness).

I was also a bit unhappy by this solution but the rtnl is a real mess
when playing with background works. At least I was not able at all to
make it fit. I'm gonna try to summarize the situation to argue in favor
of the current solution, but I am really open if you see another way.

A scan is started by the user, through a netlink command. It basically
involves stopping any other activity on the transceiver, setting a
particular filtering mode, and possibly sending beacons through the MLME
Tx API at a regular interval.

A scan command from the user then involves acquiring the rtnl just to
be sure that nothing else is requested in parallel. The rtnl is taken
and released by the netlink core, just for the time of the
configuring/triggering action.

We absolutely do not want to keep the rtnl here, I believe we are
aligned on that. This means we need to protect ourselves against a
number of user actions:
1- dropping the device (without stopping the background job/cleaning
   everything),
2- transmitting packets
3- changing internal parameters such as the page/channel to avoid
   messing with the ongoing scan.

The current implementation does the following:
1- in the ieee802154 layer we call dev_hold/dev_put to prevent device
   removal,
2- in the soft mac layer we stop the queue,
3- in the soft mac layer we refuse any channel change command coming
   from the netlink layer during scans, because this is not a nl
   constraint, but a mac state constraint, so I think it is safe to
   handle that from the soft mac layer rather than at the nl level.

This is how I planned to handle the refcount and channel change issues.

Now, let me try to argue in favor of this commit.

The problem I faced was a circular dependency on the device sending
beacons or beacons requests, ie. sending MLME frames in the background.
For the record, in both cases, I need to put some parameters in one of
the main soft mac structures. I created local->scan_lock and
local->beacon_lock to protect accesses to the scanning and beaconing
structures respectively (we don't want eg. the struct to be freed while
a job is using it).

Let's take the situation of the device sending beacons in the
background.

For starting to send beacons, the user sends a netlink command. In the
kernel first layers, the rtnl is acquired (almost) automatically, then
the callback function in the soft mac does the job. One of the first
operations is to acquire the beacons_lock.

Lockdep detects that during the background operation, the kworker will
first acquire beacons_lock (it encloses the whole operation) and after
acquiring this first lock it will perform an MLME Tx to send the
beacon. But this, unfortunately, acquires the rtnl, which triggers the
following warning:

[ 1445.105706]  Possible unsafe locking scenario:
//               -> background job            -> nl802154_send_beacons()   
[ 1445.105707]        CPU0                    CPU1
[ 1445.105708]        ----                    ----
[ 1445.105709]   lock(&local->beacon_lock);
[ 1445.105710]                                lock(rtnl_mutex);
[ 1445.105712]                                lock(&local->beacon_lock);
[ 1445.105713]   lock(rtnl_mutex);

Exactly the same happens in the scanning path during active scans:

[   52.518741]  Possible unsafe locking scenario:
//               -> background job            -> nl802154_trigger_scan()
[   52.518742]        CPU0                    CPU1
[   52.518743]        ----                    ----
[   52.518744]   lock(&local->scan_lock);
[   52.518746]                                lock(rtnl_mutex);
[   52.518748]                                lock(&local->scan_lock);
[   52.518750]   lock(rtnl_mutex);

In practice I doubt these situations can really happen because there is
no background job running if the triggering netlink command was not
yet called, but anyway, I feel too weak against locking scenarios
to disobey such a clear lockdep warning :-)

So, from my understanding it was safe not to acquire the rtnl in the
MLME Tx path, as long as the calls were serialized (with another
mutex). You seem not to agree with it, which I completely understand,
but then how do I handle those circular dependencies?

Do you think like me they are false positives?

Thanks,
Miquèl

  reply	other threads:[~2022-08-19 17:45 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-01 14:30 [PATCH wpan-next 00/20] net: ieee802154: Support scanning/beaconing Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 01/20] net: mac802154: Allow the creation of coordinator interfaces Miquel Raynal
2022-07-06  1:51   ` Alexander Aring
2022-08-19 17:11     ` Miquel Raynal
2022-08-23 12:33       ` Alexander Aring
2022-08-23 16:29         ` Miquel Raynal
2022-08-23 21:44           ` Alexander Aring
2022-08-24  7:35             ` Miquel Raynal
2022-08-24 21:43               ` Alexander Aring
2022-08-25  8:40                 ` Miquel Raynal
2022-08-26  0:51                   ` Alexander Aring
2022-08-26  1:35                     ` Alexander Aring
2022-08-26  8:08                       ` Miquel Raynal
2022-08-29  2:31                         ` Alexander Aring
2022-08-29  8:05                           ` Miquel Raynal
2022-08-26  7:30                     ` Miquel Raynal
2022-08-24 10:20             ` Miquel Raynal
2022-08-24 12:43               ` Alexander Aring
2022-08-24 13:26                 ` Miquel Raynal
2022-08-24 21:53                   ` Alexander Aring
2022-08-25  1:02                     ` Alexander Aring
2022-08-25  8:46                       ` Miquel Raynal
2022-08-25 12:58                     ` Miquel Raynal
2022-08-26  1:05                       ` Alexander Aring
2022-08-26  7:54                         ` Miquel Raynal
2022-08-29  2:52                           ` Alexander Aring
2022-08-29  8:02                             ` Miquel Raynal
2022-08-30  2:23                               ` Alexander Aring
2022-08-31 15:39                                 ` Miquel Raynal
2022-09-01  0:09                                   ` Miquel Raynal
2022-09-01 13:09                                     ` Miquel Raynal
2022-09-02  2:38                                       ` Alexander Aring
2022-09-03  0:08                                         ` Miquel Raynal
2022-09-03 14:20                                           ` Alexander Aring
2022-09-03 14:31                                             ` Alexander Aring
2022-09-03 16:05                                             ` Miquel Raynal
2022-09-03 18:21                                               ` Alexander Aring
2022-09-03 18:29                                                 ` Alexander Aring
2022-09-03 19:07                                               ` Alexander Aring
2022-09-03 19:10                                                 ` Alexander Aring
2022-09-03 19:40                                                   ` Alexander Aring
2022-09-05  3:16                                                     ` Miquel Raynal
2022-09-05 22:35                                                       ` Alexander Aring
2022-09-02  2:23                                     ` Alexander Aring
2022-09-02  2:39                                       ` Alexander Aring
2022-09-02  2:09                                   ` Alexander Aring
2022-07-01 14:30 ` [PATCH wpan-next 02/20] net: ieee802154: Advertize coordinators discovery Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 03/20] net: ieee802154: Handle " Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 04/20] net: ieee802154: Trace the registration of new PANs Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 05/20] net: ieee802154: Define frame types Miquel Raynal
2022-07-11  2:06   ` Alexander Aring
2022-08-19 17:13     ` Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 06/20] net: ieee802154: Add support for user scanning requests Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 07/20] net: ieee802154: Define a beacon frame header Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 08/20] net: mac802154: Prepare forcing specific symbol duration Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 09/20] net: mac802154: Introduce a global device lock Miquel Raynal
2022-07-04  1:12   ` Alexander Aring
2022-08-19 17:06     ` Miquel Raynal [this message]
2022-07-01 14:30 ` [PATCH wpan-next 10/20] net: mac802154: Handle passive scanning Miquel Raynal
2022-07-15  3:33   ` Alexander Aring
2022-07-15  3:42     ` Alexander Aring
2022-08-19 17:22       ` Miquel Raynal
2022-08-01 23:42     ` Alexander Aring
2022-08-01 23:54       ` Alexander Aring
2022-07-01 14:30 ` [PATCH wpan-next 11/20] net: ieee802154: Add support for user beaconing requests Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 12/20] net: mac802154: Handle basic beaconing Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 13/20] net: ieee802154: Add support for user active scan requests Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 14/20] net: mac802154: Handle active scanning Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 15/20] net: ieee802154: Add support for allowing to answer BEACON_REQ Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 16/20] net: mac802154: Handle received BEACON_REQ Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 17/20] net: ieee802154: Handle limited devices with only datagram support Miquel Raynal
2022-07-15  3:16   ` Alexander Aring
2022-08-19 17:13     ` Miquel Raynal
2022-08-23 12:43       ` Alexander Aring
2022-07-01 14:30 ` [PATCH wpan-next 18/20] ieee802154: ca8210: Flag the driver as being limited Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 19/20] ieee802154: hwsim: Do not check the rtnl Miquel Raynal
2022-07-06  1:23   ` Alexander Aring
2022-08-01 23:58     ` Alexander Aring
2022-08-19 17:09     ` Miquel Raynal
2022-08-25 22:41       ` Miquel Raynal
2022-07-01 14:30 ` [PATCH wpan-next 20/20] ieee802154: hwsim: Allow devices to be coordinators Miquel Raynal
2022-07-11  2:01   ` Alexander Aring
2022-08-19 17:12     ` Miquel Raynal
2022-07-04  1:17 ` [PATCH wpan-next 00/20] net: ieee802154: Support scanning/beaconing Alexander Aring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220819190618.4647849f@xps-13 \
    --to=miquel.raynal@bootlin.com \
    --cc=aahringo@redhat.com \
    --cc=alex.aring@gmail.com \
    --cc=davem@davemloft.net \
    --cc=david.girault@qorvo.com \
    --cc=edumazet@google.com \
    --cc=frederic.blain@qorvo.com \
    --cc=kuba@kernel.org \
    --cc=linux-wpan@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nico@ni.fr.eu.org \
    --cc=pabeni@redhat.com \
    --cc=romuald.despres@qorvo.com \
    --cc=stefan@datenfreihafen.org \
    --cc=thomas.petazzoni@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).