* [PATCH] station: fix reentrant FT wiphy radio work insertion
@ 2023-05-01 22:06 James Prestwood
2023-05-08 17:56 ` Denis Kenzior
0 siblings, 1 reply; 3+ messages in thread
From: James Prestwood @ 2023-05-01 22:06 UTC (permalink / raw)
To: iwd; +Cc: James Prestwood
The failure path for FT was inserting the same work item inside
the 'do_work' callback. This caused duplicate work items to be
inside the radio work queue, and upon destroy the ID would get
zero'ed out. This caused a host of valgrind complaints when this
behavior is triggered.
This can be seen with slight modifiations to the FT test:
src/station.c:station_try_next_transition() 9, target 12:00:00:00:00:02
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 6
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 7
src/wiphy.c:wiphy_radio_work_done() Work item 5 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 6
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 6 done
--- Work item 7 is 'station_ft_work_ready()' ---
src/wiphy.c:wiphy_radio_work_next() Starting work item 7
src/station.c:station_debug_event() StationDebug.Event(ft-roam-failed)
src/station.c:station_try_next_transition() 9, target 12:00:00:00:00:03
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 8
--- Here we insert work item 9, which is a duplicate of 7 ---
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 9
src/wiphy.c:wiphy_radio_work_next() Starting work item 8
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 8 done
--- And finally, the duplicated work item starts ---
src/wiphy.c:wiphy_radio_work_next() Starting work item 0
src/station.c:station_debug_event() StationDebug.Event(ft-roam-failed)
src/station.c:station_roam_failed() 9
src/station.c:station_roam_trigger_cb() 9
---
src/station.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/src/station.c b/src/station.c
index 0d8f7628..3c6e43ce 100644
--- a/src/station.c
+++ b/src/station.c
@@ -122,6 +122,7 @@ struct station {
uint32_t wiphy_watch;
+ struct l_idle *ft_idle;
struct wiphy_radio_work_item ft_work;
bool preparing_roam : 1;
@@ -1739,6 +1740,11 @@ static void station_roam_state_clear(struct station *station)
station->roam_freqs = NULL;
}
+ if (station->ft_idle) {
+ l_idle_remove(station->ft_idle);
+ station->ft_idle = NULL;
+ }
+
l_queue_clear(station->roam_bss_list, l_free);
ft_clear_authentications(netdev_get_ifindex(station->netdev));
@@ -2282,6 +2288,16 @@ static void station_preauthenticate_cb(struct netdev *netdev,
static void station_transition_start(struct station *station);
+static void station_transition_idle(struct l_idle *idle, void *user_data)
+{
+ struct station *station = user_data;
+
+ l_idle_remove(station->ft_idle);
+ station->ft_idle = NULL;
+
+ station_transition_start(station);
+}
+
static bool station_ft_work_ready(struct wiphy_radio_work_item *item)
{
struct station *station = l_container_of(item, struct station, ft_work);
@@ -2300,7 +2316,8 @@ static bool station_ft_work_ready(struct wiphy_radio_work_item *item)
if (ret == -ENOENT) {
station_debug_event(station, "ft-roam-failed");
try_next:
- station_transition_start(station);
+ station->ft_idle = l_idle_create(station_transition_idle,
+ station, NULL);
return true;
} else if (ret < 0)
goto assoc_failed;
--
2.25.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] station: fix reentrant FT wiphy radio work insertion
2023-05-01 22:06 [PATCH] station: fix reentrant FT wiphy radio work insertion James Prestwood
@ 2023-05-08 17:56 ` Denis Kenzior
2023-05-08 18:10 ` James Prestwood
0 siblings, 1 reply; 3+ messages in thread
From: Denis Kenzior @ 2023-05-08 17:56 UTC (permalink / raw)
To: James Prestwood, iwd
Hi James,
On 5/1/23 17:06, James Prestwood wrote:
> The failure path for FT was inserting the same work item inside
> the 'do_work' callback. This caused duplicate work items to be
> inside the radio work queue, and upon destroy the ID would get
> zero'ed out. This caused a host of valgrind complaints when this
> behavior is triggered.
>
> This can be seen with slight modifiations to the FT test:
Typo here, 'modifiations'
>
> src/station.c:station_try_next_transition() 9, target 12:00:00:00:00:02
> src/wiphy.c:wiphy_radio_work_insert() Inserting work item 6
> src/wiphy.c:wiphy_radio_work_insert() Inserting work item 7
> src/wiphy.c:wiphy_radio_work_done() Work item 5 done
> src/wiphy.c:wiphy_radio_work_next() Starting work item 6
> src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
> src/ft.c:ft_send_authenticate()
> src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
> src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
> src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
> src/wiphy.c:wiphy_radio_work_done() Work item 6 done
>
> --- Work item 7 is 'station_ft_work_ready()' ---
>
> src/wiphy.c:wiphy_radio_work_next() Starting work item 7
> src/station.c:station_debug_event() StationDebug.Event(ft-roam-failed)
> src/station.c:station_try_next_transition() 9, target 12:00:00:00:00:03
> src/wiphy.c:wiphy_radio_work_insert() Inserting work item 8
>
> --- Here we insert work item 9, which is a duplicate of 7 ---
>
Duplicate in what way? We're going to a different BSS? Or do you mean that we
overwrite station->ft_work because we queue the same work item in the work queue
callback?
> src/wiphy.c:wiphy_radio_work_insert() Inserting work item 9
> src/wiphy.c:wiphy_radio_work_next() Starting work item 8
> src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
> src/ft.c:ft_send_authenticate()
> src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
> src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
> src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
> src/wiphy.c:wiphy_radio_work_done() Work item 8 done
>
> --- And finally, the duplicated work item starts ---
>
> src/wiphy.c:wiphy_radio_work_next() Starting work item 0
> src/station.c:station_debug_event() StationDebug.Event(ft-roam-failed)
> src/station.c:station_roam_failed() 9
> src/station.c:station_roam_trigger_cb() 9
> ---
> src/station.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/src/station.c b/src/station.c
> index 0d8f7628..3c6e43ce 100644
> --- a/src/station.c
> +++ b/src/station.c
> @@ -122,6 +122,7 @@ struct station {
>
> uint32_t wiphy_watch;
>
> + struct l_idle *ft_idle;
> struct wiphy_radio_work_item ft_work;
>
> bool preparing_roam : 1;
> @@ -1739,6 +1740,11 @@ static void station_roam_state_clear(struct station *station)
> station->roam_freqs = NULL;
> }
>
> + if (station->ft_idle) {
> + l_idle_remove(station->ft_idle);
> + station->ft_idle = NULL;
> + }
> +
> l_queue_clear(station->roam_bss_list, l_free);
>
> ft_clear_authentications(netdev_get_ifindex(station->netdev));
> @@ -2282,6 +2288,16 @@ static void station_preauthenticate_cb(struct netdev *netdev,
>
> static void station_transition_start(struct station *station);
>
> +static void station_transition_idle(struct l_idle *idle, void *user_data)
> +{
> + struct station *station = user_data;
> +
> + l_idle_remove(station->ft_idle);
> + station->ft_idle = NULL;
> +
> + station_transition_start(station);
> +}
> +
> static bool station_ft_work_ready(struct wiphy_radio_work_item *item)
> {
> struct station *station = l_container_of(item, struct station, ft_work);
> @@ -2300,7 +2316,8 @@ static bool station_ft_work_ready(struct wiphy_radio_work_item *item)
> if (ret == -ENOENT) {
> station_debug_event(station, "ft-roam-failed");
> try_next:
> - station_transition_start(station);
> + station->ft_idle = l_idle_create(station_transition_idle,
> + station, NULL);
> return true;
> } else if (ret < 0)
> goto assoc_failed;
The patch makes sense to me. However, I do wonder whether we can take care of
this in the wiphy work queue itself? Maybe by adding wiphy_work_reschedule() or
something to that effect? I would hate to keep doing this l_idle business every
time we need to do something similar.
Regards,
-Denis
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] station: fix reentrant FT wiphy radio work insertion
2023-05-08 17:56 ` Denis Kenzior
@ 2023-05-08 18:10 ` James Prestwood
0 siblings, 0 replies; 3+ messages in thread
From: James Prestwood @ 2023-05-08 18:10 UTC (permalink / raw)
To: Denis Kenzior, iwd
Hi Denis,
On 5/8/23 10:56 AM, Denis Kenzior wrote:
> Hi James,
>
> On 5/1/23 17:06, James Prestwood wrote:
>> The failure path for FT was inserting the same work item inside
>> the 'do_work' callback. This caused duplicate work items to be
>> inside the radio work queue, and upon destroy the ID would get
>> zero'ed out. This caused a host of valgrind complaints when this
>> behavior is triggered.
>>
>> This can be seen with slight modifiations to the FT test:
>
> Typo here, 'modifiations'
>
>>
>> src/station.c:station_try_next_transition() 9, target 12:00:00:00:00:02
>> src/wiphy.c:wiphy_radio_work_insert() Inserting work item 6
>> src/wiphy.c:wiphy_radio_work_insert() Inserting work item 7
>> src/wiphy.c:wiphy_radio_work_done() Work item 5 done
>> src/wiphy.c:wiphy_radio_work_next() Starting work item 6
>> src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
>> src/ft.c:ft_send_authenticate()
>> src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
>> src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
>> src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on
>> Channel(56)
>> src/wiphy.c:wiphy_radio_work_done() Work item 6 done
>>
>> --- Work item 7 is 'station_ft_work_ready()' ---
>>
>> src/wiphy.c:wiphy_radio_work_next() Starting work item 7
>> src/station.c:station_debug_event() StationDebug.Event(ft-roam-failed)
>> src/station.c:station_try_next_transition() 9, target 12:00:00:00:00:03
>> src/wiphy.c:wiphy_radio_work_insert() Inserting work item 8
>>
>> --- Here we insert work item 9, which is a duplicate of 7 ---
>>
>
> Duplicate in what way? We're going to a different BSS? Or do you mean
> that we overwrite station->ft_work because we queue the same work item
> in the work queue callback?
That we are using the same ft_work structure which already exists in the
queue. So when it gets cleaned up it modifies both queue items.
>
>> src/wiphy.c:wiphy_radio_work_insert() Inserting work item 9
>> src/wiphy.c:wiphy_radio_work_next() Starting work item 8
>> src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
>> src/ft.c:ft_send_authenticate()
>> src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
>> src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
>> src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on
>> Channel(56)
>> src/wiphy.c:wiphy_radio_work_done() Work item 8 done
>>
>> --- And finally, the duplicated work item starts ---
>>
>> src/wiphy.c:wiphy_radio_work_next() Starting work item 0
>> src/station.c:station_debug_event() StationDebug.Event(ft-roam-failed)
>> src/station.c:station_roam_failed() 9
>> src/station.c:station_roam_trigger_cb() 9
>> ---
>> src/station.c | 19 ++++++++++++++++++-
>> 1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/station.c b/src/station.c
>> index 0d8f7628..3c6e43ce 100644
>> --- a/src/station.c
>> +++ b/src/station.c
>> @@ -122,6 +122,7 @@ struct station {
>> uint32_t wiphy_watch;
>> + struct l_idle *ft_idle;
>> struct wiphy_radio_work_item ft_work;
>> bool preparing_roam : 1;
>> @@ -1739,6 +1740,11 @@ static void station_roam_state_clear(struct
>> station *station)
>> station->roam_freqs = NULL;
>> }
>> + if (station->ft_idle) {
>> + l_idle_remove(station->ft_idle);
>> + station->ft_idle = NULL;
>> + }
>> +
>> l_queue_clear(station->roam_bss_list, l_free);
>> ft_clear_authentications(netdev_get_ifindex(station->netdev));
>> @@ -2282,6 +2288,16 @@ static void station_preauthenticate_cb(struct
>> netdev *netdev,
>> static void station_transition_start(struct station *station);
>> +static void station_transition_idle(struct l_idle *idle, void
>> *user_data)
>> +{
>> + struct station *station = user_data;
>> +
>> + l_idle_remove(station->ft_idle);
>> + station->ft_idle = NULL;
>> +
>> + station_transition_start(station);
>> +}
>> +
>> static bool station_ft_work_ready(struct wiphy_radio_work_item *item)
>> {
>> struct station *station = l_container_of(item, struct station,
>> ft_work);
>> @@ -2300,7 +2316,8 @@ static bool station_ft_work_ready(struct
>> wiphy_radio_work_item *item)
>> if (ret == -ENOENT) {
>> station_debug_event(station, "ft-roam-failed");
>> try_next:
>> - station_transition_start(station);
>> + station->ft_idle = l_idle_create(station_transition_idle,
>> + station, NULL);
>> return true;
>> } else if (ret < 0)
>> goto assoc_failed;
>
> The patch makes sense to me. However, I do wonder whether we can take
> care of this in the wiphy work queue itself? Maybe by adding
> wiphy_work_reschedule() or something to that effect? I would hate to
> keep doing this l_idle business every time we need to do something similar.
I was thinking that, maybe adding an API would work. I'll see if that is
possible. Basically the problem I ran into trying to handle it
automatically was the fact that the work item memory is provided by the
caller, so when you change one it changes both.
>
> Regards,
> -Denis
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-05-08 18:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-01 22:06 [PATCH] station: fix reentrant FT wiphy radio work insertion James Prestwood
2023-05-08 17:56 ` Denis Kenzior
2023-05-08 18:10 ` James Prestwood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).