All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix __ieee80211_disconnect when not associated
@ 2022-10-25 20:34 James Prestwood
  2022-10-25 20:34 ` [PATCH] wifi: mac80211: handle connection loss in __ieee80211_disconnect James Prestwood
  2023-01-18 16:06 ` [PATCH 0/1] Fix __ieee80211_disconnect when not associated Johannes Berg
  0 siblings, 2 replies; 8+ messages in thread
From: James Prestwood @ 2022-10-25 20:34 UTC (permalink / raw)
  To: linux-wireless; +Cc: James Prestwood

A user reported some behavior where IWD hangs expecting another event
to come and it never does. This was due to the firmware (iwlwifi)
timing out after authentication and calling __ieee80211_disconnect
which essentially does nothing if not associated. The problem here
is userspace expects some event to come after authenticating whether
it be an association, disconnect, death etc.

This my attempt at fixing the problem. Note that the real issue
here is iwlwifi. The user can reliably reproduce this problem which
points to a firmware issue since a connection loss should be quite
rare I assume. Regardless, any bad driver/firmware could cause this
behavior since __ieee80211_disconnect only handles being associated.

I personally cannot test this, so I hacked mac80211 to call
__ieee80211_disconnect immediately after sending the authenticate
event. This effectively causes the same behavior and indeed this
patch causes a disconnect which is the behavior I would expect.

I asked about this a week ago with no response so hopefully this
gets more attention and at least gets the conversation going, even
if I went about the fix wrong.

James Prestwood (1):
  wifi: mac80211: handle connection loss in __ieee80211_disconnect

 net/mac80211/mlme.c | 5 +++++
 1 file changed, 5 insertions(+)

-- 
2.34.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] wifi: mac80211: handle connection loss in __ieee80211_disconnect
  2022-10-25 20:34 [PATCH 0/1] Fix __ieee80211_disconnect when not associated James Prestwood
@ 2022-10-25 20:34 ` James Prestwood
  2023-01-18 16:04   ` Johannes Berg
  2023-01-18 16:06 ` [PATCH 0/1] Fix __ieee80211_disconnect when not associated Johannes Berg
  1 sibling, 1 reply; 8+ messages in thread
From: James Prestwood @ 2022-10-25 20:34 UTC (permalink / raw)
  To: linux-wireless; +Cc: James Prestwood

The ieee80211_connection_loss() can be called during times when the
kernels state is in flux, such as after a successful authentication
but prior to successful association. This can result in the kernel
never telling userspace due to __ieee80211_disconnect bailing out
if !ifmgd->associated. This has been seen out in the wild on
iwlwifi:

[503619.324379] wlan0: disconnect from AP d0:15:a6:70:e1:20 for new auth to d0:15:a6:70:b5:40
[503619.367204] wlan0: authenticate with d0:15:a6:70:b5:40
[503619.367233] wlan0: bad VHT capabilities, disabling VHT
[503619.367236] wlan0: Invalid HE elem, Disable HE
[503619.367237] wlan0: 80 MHz not supported, disabling VHT
[503619.371184] wlan0: send auth to d0:15:a6:70:b5:40 (try 1/3)
[503619.406401] wlan0: authenticated
[503620.270833] iwlwifi 0000:00:14.3: Not associated and the session protection is over already...
[503620.270943] wlan0: Connection to AP d0:15:a6:70:b5:40 lost

At this point userspace has received a CMD_AUTHENTICATE event but
nothing else. No disconnect or anything to indicate something is
wrong. Userspace supplicants expect _something_ to come after a
successful authentication.

This patch modifies __ieee80211_disconnect() to call
cfg80211_disconnect which will ultimately send a disconnect event
to userspace notifying it of the situation.

Signed-off-by: James Prestwood <prestwoj@gmail.com>
---
 net/mac80211/mlme.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index fc764984d687..5c88cf717fb2 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -3270,6 +3270,11 @@ static void __ieee80211_disconnect(struct ieee80211_sub_if_data *sdata)
 
 	sdata_lock(sdata);
 	if (!ifmgd->associated) {
+		if (ifmgd->connection_loss)
+			cfg80211_disconnected(sdata->wdev.netdev,
+					      WLAN_REASON_UNSPECIFIED, NULL, 0,
+					      true, GFP_KERNEL);
+
 		sdata_unlock(sdata);
 		return;
 	}
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] wifi: mac80211: handle connection loss in __ieee80211_disconnect
  2022-10-25 20:34 ` [PATCH] wifi: mac80211: handle connection loss in __ieee80211_disconnect James Prestwood
@ 2023-01-18 16:04   ` Johannes Berg
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Berg @ 2023-01-18 16:04 UTC (permalink / raw)
  To: James Prestwood, linux-wireless

On Tue, 2022-10-25 at 13:34 -0700, James Prestwood wrote:
> The ieee80211_connection_loss() can be called during times when the
> kernels state is in flux, such as after a successful authentication
> but prior to successful association. This can result in the kernel
> never telling userspace due to __ieee80211_disconnect bailing out
> if !ifmgd->associated. This has been seen out in the wild on
> iwlwifi:
> 
> [503619.324379] wlan0: disconnect from AP d0:15:a6:70:e1:20 for new auth to d0:15:a6:70:b5:40
> [503619.367204] wlan0: authenticate with d0:15:a6:70:b5:40
> [503619.367233] wlan0: bad VHT capabilities, disabling VHT
> [503619.367236] wlan0: Invalid HE elem, Disable HE
> [503619.367237] wlan0: 80 MHz not supported, disabling VHT
> [503619.371184] wlan0: send auth to d0:15:a6:70:b5:40 (try 1/3)
> [503619.406401] wlan0: authenticated
> [503620.270833] iwlwifi 0000:00:14.3: Not associated and the session protection is over already...
> [503620.270943] wlan0: Connection to AP d0:15:a6:70:b5:40 lost
> 
> At this point userspace has received a CMD_AUTHENTICATE event but
> nothing else. No disconnect or anything to indicate something is
> wrong. Userspace supplicants expect _something_ to come after a
> successful authentication.
> 

I'm not sure I understand this scenario - there's nothing wrong with
this, is there?

The way I read this is that userspace simply asked for auth, but then
didn't (quickly enough) ask for assoc? Or got a comeback or something
(though we would log that too, I think)?

Userspace got a successful NL80211_CMD_AUTHENTICATE event at this point,
but didn't associate (yet anyway).

You could argue that iwlwifi shouldn't be waiting for an association if
nobody ever requested association, but that's a driver bug, and
shouldn't cause any problem with state between userspace and kernel.

johannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated
  2022-10-25 20:34 [PATCH 0/1] Fix __ieee80211_disconnect when not associated James Prestwood
  2022-10-25 20:34 ` [PATCH] wifi: mac80211: handle connection loss in __ieee80211_disconnect James Prestwood
@ 2023-01-18 16:06 ` Johannes Berg
  2023-01-18 17:54   ` James Prestwood
  1 sibling, 1 reply; 8+ messages in thread
From: Johannes Berg @ 2023-01-18 16:06 UTC (permalink / raw)
  To: James Prestwood, linux-wireless

Oh I guess I should read the cover letter too ...

On Tue, 2022-10-25 at 13:34 -0700, James Prestwood wrote:
> A user reported some behavior where IWD hangs expecting another event
> to come and it never does. This was due to the firmware (iwlwifi)
> timing out after authentication and calling __ieee80211_disconnect
> which essentially does nothing if not associated. The problem here
> is userspace expects some event to come after authenticating whether
> it be an association, disconnect, death etc.
> 

Basically I don't understand why userspace expects some event. It asked
for authentication, and you got it. That's all. I don't see userspace
asking for association, or anything else, so what would it be waiting
for?

johannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated
  2023-01-18 16:06 ` [PATCH 0/1] Fix __ieee80211_disconnect when not associated Johannes Berg
@ 2023-01-18 17:54   ` James Prestwood
  2023-01-18 18:08     ` Johannes Berg
  0 siblings, 1 reply; 8+ messages in thread
From: James Prestwood @ 2023-01-18 17:54 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

On Wed, 2023-01-18 at 17:06 +0100, Johannes Berg wrote:
> Oh I guess I should read the cover letter too ...
> 
> On Tue, 2022-10-25 at 13:34 -0700, James Prestwood wrote:
> > A user reported some behavior where IWD hangs expecting another
> > event
> > to come and it never does. This was due to the firmware (iwlwifi)
> > timing out after authentication and calling __ieee80211_disconnect
> > which essentially does nothing if not associated. The problem here
> > is userspace expects some event to come after authenticating
> > whether
> > it be an association, disconnect, death etc.
> > 
> 
> Basically I don't understand why userspace expects some event. It
> asked
> for authentication, and you got it. That's all. I don't see userspace
> asking for association, or anything else, so what would it be waiting
> for?

I should have explained this better. I dug up the old thread from the
original report and IIRC this is the sequence of events:

1. Begin reassociation to new BSS via CMD_CONNECT. This results in the
kernel sending many events to remove the current BSS in favor of the
new one:

2. Receive DEL_STATION, DEAUTHENTICATE, DISCONNECT, NEW_STATION

3. Then a CMD_AUTHENTICATE, with a success status.

4. At this point the firmware decides its not gonna continue and calls
__ieee80211_disconnect which is a no-op when not associated. We assumed
either CMD_ASSOCIATE or CMD_CONNECT will come after CMD_AUTHENTICATE,
or a CMD_DEAUTH if there was a problem.

I will say, we do see a CMD_DEL_STATION event here which we never
processed in this case. But again, I would expect a CMD_DEAUTHENTICATE
since we successfully authenticated, right?

Thanks,
James


> 
> johannes



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated
  2023-01-18 17:54   ` James Prestwood
@ 2023-01-18 18:08     ` Johannes Berg
  2023-01-18 18:39       ` James Prestwood
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Berg @ 2023-01-18 18:08 UTC (permalink / raw)
  To: James Prestwood, linux-wireless

On Wed, 2023-01-18 at 09:54 -0800, James Prestwood wrote:
> > 
> > Basically I don't understand why userspace expects some event. It
> > asked
> > for authentication, and you got it. That's all. I don't see userspace
> > asking for association, or anything else, so what would it be waiting
> > for?
> 
> I should have explained this better. I dug up the old thread from the
> original report and IIRC this is the sequence of events:
> 
> 1. Begin reassociation to new BSS via CMD_CONNECT.
> 

Oh, CMD_CONNECT, OK.

>  This results in the
> kernel sending many events to remove the current BSS in favor of the
> new one:
> 
> 2. Receive DEL_STATION, DEAUTHENTICATE, DISCONNECT, NEW_STATION

So far, so good.

> 3. Then a CMD_AUTHENTICATE, with a success status.

Sure, that's what we see.

> 4. At this point the firmware decides its not gonna continue and calls
> __ieee80211_disconnect which is a no-op when not associated.
> 

This is also completely irrelevant to the state. The firmware doesn't
"decide not to continue". Something else already decided it doesn't want
to continue. This is entirely driven from the top - be it the
net/wireless/sme.c for CMD_CONNECT, or userspace for CMD_AUTHENTICATE
and CMD_ASSOCIATE.

> We assumed
> either CMD_ASSOCIATE or CMD_CONNECT will come after CMD_AUTHENTICATE,
> or a CMD_DEAUTH if there was a problem.

Yeah for CMD_CONNECT I guess that's reasonable. In fact you really
shouldn't even look at the authenticate/associate even in these cases
since they're not guaranteed (e.g. if you have a fullmac device), they
only happen as a consequence of the mac80211 implementation, not by
design.

So I think you're just looking in the wrong place - the real question is
why the association sequence in net/wireless/sme.c doesn't continue (or
abort) at this point?

The whole "firmware event" thing is completely incidental and an
implementation detail. The firmware decides nothing. All that happens
here is that the firmware decided to no longer wait for the association
to happen (but that wasn't even started), and a driver bug that
basically just prints the wrong message - it says it's expecting to be
associated but it can't actually expect that since we never even
transmitted an association request.

> I will say, we do see a CMD_DEL_STATION event here which we never
> processed in this case.

And you probably shouldn't anyway, again more of an implementation
detail, not really by design.

> But again, I would expect a CMD_DEAUTHENTICATE
> since we successfully authenticated, right?

No, you can't expect that, you could be authenticated with the AP for an
indefinite amount of time, or never hear the deauth frame (if it ever
sends one).

johannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated
  2023-01-18 18:08     ` Johannes Berg
@ 2023-01-18 18:39       ` James Prestwood
  2023-01-18 19:39         ` Johannes Berg
  0 siblings, 1 reply; 8+ messages in thread
From: James Prestwood @ 2023-01-18 18:39 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

On Wed, 2023-01-18 at 19:08 +0100, Johannes Berg wrote:
> On Wed, 2023-01-18 at 09:54 -0800, James Prestwood wrote:
> > > 

<snip>
> 

> > 4. At this point the firmware decides its not gonna continue and
> > calls
> > __ieee80211_disconnect which is a no-op when not associated.
> > 
> 
> This is also completely irrelevant to the state. The firmware doesn't
> "decide not to continue". Something else already decided it doesn't
> want
> to continue. This is entirely driven from the top - be it the
> net/wireless/sme.c for CMD_CONNECT, or userspace for CMD_AUTHENTICATE
> and CMD_ASSOCIATE.

Well, we see this:

iwlwifi 0000:00:14.3: Not associated and the session protection is over
already...

So _something_ happens in the driver/firmware causing this event to be
generated, thats all I'm saying. But yes, it really doesn't matter why
but rather how mac80211 handles it.

> 
> > We assumed
> > either CMD_ASSOCIATE or CMD_CONNECT will come after
> > CMD_AUTHENTICATE,
> > or a CMD_DEAUTH if there was a problem.
> 
> Yeah for CMD_CONNECT I guess that's reasonable. In fact you really
> shouldn't even look at the authenticate/associate even in these cases
> since they're not guaranteed (e.g. if you have a fullmac device),
> they
> only happen as a consequence of the mac80211 implementation, not by
> design.
> 
> So I think you're just looking in the wrong place - the real question
> is
> why the association sequence in net/wireless/sme.c doesn't continue
> (or
> abort) at this point?

I think I narrowed down this "why" pretty well. A connection loss event
happens between a successful authentication but before association.
Since __ieee80211_disconnect does not take this state into account the
kernel haults the reassociation and never informs userspace.

Maybe my solution/fix is incorrect, but its at least a starting point.

> 
> The whole "firmware event" thing is completely incidental and an
> implementation detail. The firmware decides nothing. All that happens
> here is that the firmware decided to no longer wait for the
> association
> to happen (but that wasn't even started), and a driver bug that
> basically just prints the wrong message - it says it's expecting to
> be
> associated but it can't actually expect that since we never even
> transmitted an association request.
> 
> > I will say, we do see a CMD_DEL_STATION event here which we never
> > processed in this case.
> 
> And you probably shouldn't anyway, again more of an implementation
> detail, not really by design.
> 
> > But again, I would expect a CMD_DEAUTHENTICATE
> > since we successfully authenticated, right?
> 
> No, you can't expect that, you could be authenticated with the AP for
> an
> indefinite amount of time, or never hear the deauth frame (if it ever
> sends one).

Ok, so at least a CMD_CONNECT event right?

Maybe I'm giving nl80211 too much credit, but the
CMD_AUTH/ASSOC/CONNECT APIs have always seemed symmetric in terms of
events, in that if you issue one of these commands you will get an
event in return. So I would expect CMD_CONNECT to generate a
CMD_CONNECT event.

> 
> johannes



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated
  2023-01-18 18:39       ` James Prestwood
@ 2023-01-18 19:39         ` Johannes Berg
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Berg @ 2023-01-18 19:39 UTC (permalink / raw)
  To: James Prestwood, linux-wireless

On Wed, 2023-01-18 at 10:39 -0800, James Prestwood wrote:
> 
> iwlwifi 0000:00:14.3: Not associated and the session protection is over
> already...
> 
> So _something_ happens in the driver/firmware causing this event to be
> generated, thats all I'm saying.
> 

Yes. But that's also irrelevant.

> But yes, it really doesn't matter why
> but rather how mac80211 handles it.

No, it doesn't matter how mac80211 handles it. The fact that the driver
even tells mac80211 about it by calling the disconnect API is already a
bug - just one that doesn't matter now because mac80211 ignores it.

> > So I think you're just looking in the wrong place - the real question
> > is
> > why the association sequence in net/wireless/sme.c doesn't continue
> > (or
> > abort) at this point?
> 
> I think I narrowed down this "why" pretty well.

No, you haven't.

>  A connection loss event
> happens between a successful authentication but before association.

But it doesn't!

There's no "connection loss" event, it's just the driver being confused.

The flow is something like this:

 * authenticate
   -> driver prepare for authentication
   -> send auth frame
   -> get auth response
 * firmware waits for association but that never happens
 * driver prints a message and calls API

but since there was never even an *attempt* to associate, that whole end
of the "time event" and the message is irrelevant

If there was an attempt to associate it wouldn't even matter if it was
before or after the end of the time event, because if it already ended
the driver would just create a new one.

In fact, the whole reason we don't abort the time event after successful
authentication and schedule a new one on association is just an
optimisation - it's nicer to the firmware to just have one, and normally
we finish auth + assoc within a few hundred ms.


> Since __ieee80211_disconnect does not take this state into account the
> kernel haults the reassociation and never informs userspace.

No no - there's no "the kernel halts the reassociation" in this case.
You're confusing cause and effect. The *reason* we get this message and
the pointless/buggy/... call to __ieee80211_disconnect() is that there's
never an attempt to associate. It's not *causing* the lack thereof.

> Maybe my solution/fix is incorrect, but its at least a starting point.

I don't see it that way - something is clearly broken in that there's no
association attempt, but I still don't know what. All you've done is
created a special iwlwifi fallback path to let userspace recover from
it, not actually addressed the bug.

> > No, you can't expect that, you could be authenticated with the AP for
> > an
> > indefinite amount of time, or never hear the deauth frame (if it ever
> > sends one).
> 
> Ok, so at least a CMD_CONNECT event right?
> 
> Maybe I'm giving nl80211 too much credit, but the
> CMD_AUTH/ASSOC/CONNECT APIs have always seemed symmetric in terms of
> events, in that if you issue one of these commands you will get an
> event in return. So I would expect CMD_CONNECT to generate a
> CMD_CONNECT event.
> 

Yes, I would also expect a CMD_CONNECT event, via
nl80211_send_connect_result() / __cfg80211_connect_result().

But something is going wrong. I think we need to look into probably
cfg80211_sme_rx_auth() - why is that not continuing the state machine?
Surely wdev->conn didn't go away, so maybe for some reason in this case
we already have wdev->conn->state == CFG80211_CONN_CONNECTED? But even
in case of reassoc, wdev->conn is freshly allocated and should be
zeroed, at least initially. But maybe some of the events mac80211
generates during the disconnect messes with the state?

The other cases in cfg80211_sme_rx_auth() seem to generate an event
already one way or the other.

johannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-18 19:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-25 20:34 [PATCH 0/1] Fix __ieee80211_disconnect when not associated James Prestwood
2022-10-25 20:34 ` [PATCH] wifi: mac80211: handle connection loss in __ieee80211_disconnect James Prestwood
2023-01-18 16:04   ` Johannes Berg
2023-01-18 16:06 ` [PATCH 0/1] Fix __ieee80211_disconnect when not associated Johannes Berg
2023-01-18 17:54   ` James Prestwood
2023-01-18 18:08     ` Johannes Berg
2023-01-18 18:39       ` James Prestwood
2023-01-18 19:39         ` Johannes Berg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.