iwd.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix wiphy regdom logic
@ 2023-05-10 20:41 James Prestwood
  2023-05-10 20:41 ` [PATCH 1/1] wiphy: fix regdom change wiphy dump logic James Prestwood
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: James Prestwood @ 2023-05-10 20:41 UTC (permalink / raw)
  To: iwd; +Cc: James Prestwood

Below are some logs when this behavior was seen. IWD never
transitions out of the ft-roaming state due to the key setting
callbacks never being called. Its hard to know exactly what
messages were sent/received since its coming from iwmon but
netlink appears to completely lock up until a watchdog restarts
the IWD service.

After the patch was applied there were zero instances of this
behavior which seems to indicate the kernel is unable to handle
multiple GET_WIPHY calls.

May 10 00:11:54 iwd[336880]: src/station.c:station_try_next_transition() 14, target aa:46:8d:1c:16:e6
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 14498
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 14499
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_done() Work item 14497 done
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_next() Starting work item 14498
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
May 10 00:11:54 iwd[336880]: src/ft.c:ft_send_authenticate()
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
May 10 00:11:54 kernel: wlan0: disconnect from AP aa:46:8d:1c:17:25 for new assoc to aa:46:8d:1c:16:e6
May 10 00:11:54 locus[917]: State changed to roaming
May 10 00:11:54 locus[917]: Start 10000ms timeout waiting for connected/roaming
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_unicast_notify() Unicast notification Frame(59)
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_done() Work item 14498 done
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_next() Starting work item 14499
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_cqm_rssi_update()
May 10 00:11:54 iwd[336880]: src/station.c:station_enter_state() Old State: connected, new state: ft-roaming
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_deauthenticate_event()
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_disconnect_event()
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
May 10 00:11:54 kernel: wlan0: associate with aa:46:8d:1c:16:e6 (try 1/3)
May 10 00:11:54 kernel: wlan0: RX ReassocResp from aa:46:8d:1c:16:e6 (capab=0x1511 status=0 aid=7)
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
May 10 00:11:54 systemd-networkd[393]: wlan0: Lost carrier
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_associate_event()
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_set_gtk() ifindex=14 key_idx=2
May 10 00:11:54 iwd[336880]: src/station.c:station_handshake_event() Setting keys
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_set_tk() ifindex=14 key_idx=0
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_connect_event()
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_connect_event() aborting and ignore_connect_event not set, proceed
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_connect_event() expect_connect_failure not set, proceed
May 10 00:11:54 iwd[336880]: src/netdev.c:parse_request_ies()
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_connect_event() Request / Response IEs parsed
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_get_oci()
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is US
May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_cancel_last_dump() Canceling pending regdom wiphy dump (global)
May 10 00:11:54 kernel: wlan0: associated
May 10 00:11:54 kernel: ath: EEPROM regdomain: 0x8348
May 10 00:11:54 kernel: ath: EEPROM indicates we should expect a country code
May 10 00:11:54 kernel: ath: doing EEPROM country->regdmn map search
May 10 00:11:54 kernel: ath: country maps to regdmn code: 0x3a
May 10 00:11:54 kernel: ath: Country alpha2 being used: US
May 10 00:11:54 kernel: ath: Regpair used: 0x3a
May 10 00:11:54 kernel: ath: regdomain 0x8348 dynamically updated by country element
May 10 00:11:55 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
May 10 00:11:55 iwd[336880]: src/netdev.c:netdev_cqm_event() Signal change event (above=0 signal=-76)

--- Nothing here for 10 seconds, until the watchdog triggers ---

May 10 00:12:04 locus[917]: IWD watchdog triggered (reason=state)
May 10 00:12:04 iwd[336880]: Terminate
May 10 00:12:04 iwd[336880]: src/netdev.c:netdev_free() Freeing netdev wlan0[14]
May 10 00:12:04 iwd[336880]: src/device.c:device_free()
May 10 00:12:04 iwd[336880]: src/station.c:station_free()
May 10 00:12:04 iwd[336880]: src/netconfig.c:netconfig_destroy()
May 10 00:12:04 iwd[336880]: src/netconfig.c:netconfig_event_handler() l_netconfig event 2
May 10 00:12:04 iwd[336880]: src/netconfig-commit.c:netconfig_commit_print_addrs() removing address: 10.193.158.118
May 10 00:12:04 iwd[336880]: src/resolve.c:resolve_systemd_revert() ifindex: 14
May 10 00:12:04 systemd[1]: Stopping Wireless service...

The iwmon logs are lengthy, but I've included the bit where the
logs abruptly stop for 10 seconds until the watchdog kicks in.
Prior there was no key setting or anything connection related
after the connect event.

May 10 00:11:55:     Extended Capabilities: len 8
May 10 00:11:55:         Capability: bit  2: Extended channel switching
May 10 00:11:55:         Capability: bit 62: Opmode Notification
May 10 00:11:55:     Extended Capabilities Mask: 170 len 8
May 10 00:12:04:         04 00 00 00 00 00 00 40                          .......@
May 10 00:12:04:     VHT Capability Mask: 176 len 12
May 10 00:12:04:         f0 1f 80 33 ff ff 00 00 ff ff 00 00              ...3........
May 10 00:12:04:     MAC Address 00:0E:8E:A3:02:FE
May 10 00:12:04: > Result: New Wiphy (0x03) len 36 [multi]                     1683677515.942843
May 10 00:12:04:     Wiphy: 0 (0x00000000)
May 10 00:12:04:     Wiphy Name: phy0
May 10 00:12:04:     Generation: 1 (0x00000001)
May 10 00:12:04:     Max AP Assoc Station: 202 len 4
May 10 00:12:04:         20 00 00 00    

James Prestwood (1):
  wiphy: fix regdom change wiphy dump logic

 src/wiphy.c | 103 +++++++++++++++++++++++++++-------------------------
 1 file changed, 54 insertions(+), 49 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] wiphy: fix regdom change wiphy dump logic
  2023-05-10 20:41 [PATCH 0/1] Fix wiphy regdom logic James Prestwood
@ 2023-05-10 20:41 ` James Prestwood
  2023-05-10 20:41 ` [PATCH 1/2] " James Prestwood
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: James Prestwood @ 2023-05-10 20:41 UTC (permalink / raw)
  To: iwd; +Cc: James Prestwood

Certain drivers change the regulatory domain when the client roams.
First XX, then the country the roam target is using. Depending on the
timing the second regdom event can come while the first is still
dumping. This was not handled gracefully/nicely in IWD and it would
just issue another dump (it tries to cancel but if the dump started
there isn't anything that can be done). AFAIK the kernel should be
able to handle this but based on the behavior we saw it seems to not
be able to, or at least not reliably.

When the second dump comes in and GET_WIPHY is issued the kernel
stops replying to any messages which obviously causes IWD to completely
hang up waiting for various acks (set_station, set_key, etc.).

Now if another dump is ongoing (and cant be safely canceled) a flag
will be set when another dump is needed. The first dump will be
waited for and only once completed will another start. This also
ensures there is at most 1 pending dump when before any number of
dumps could be queued which is pointless since only the last dump
request matters.
---
 src/wiphy.c | 103 +++++++++++++++++++++++++++-------------------------
 1 file changed, 54 insertions(+), 49 deletions(-)

diff --git a/src/wiphy.c b/src/wiphy.c
index 2db2d2cd..dbffa44e 100644
--- a/src/wiphy.c
+++ b/src/wiphy.c
@@ -67,6 +67,7 @@ static int mac_randomize_bytes = 6;
 static char regdom_country[2];
 static uint32_t work_ids;
 static unsigned int wiphy_dump_id;
+static bool dump_queued;
 
 enum driver_flag {
 	DEFAULT_IF = 0x1,
@@ -123,6 +124,7 @@ struct wiphy {
 	bool work_in_callback;
 	unsigned int get_reg_id;
 	unsigned int dump_id;
+	bool dump_queued : 1;
 
 	bool support_scheduled_scan:1;
 	bool support_rekey_offload:1;
@@ -2066,18 +2068,23 @@ static void wiphy_setup_rm_enabled_capabilities(struct wiphy *wiphy)
 	 */
 }
 
+static void wiphy_dump_after_regdom(struct wiphy *wiphy);
+
 static void wiphy_dump_done(void *user_data)
 {
 	struct wiphy *wiphy = user_data;
 	const struct l_queue_entry *e;
 
-	/* This dump was canceled due to another dump */
-	if ((wiphy && !wiphy->dump_id) || (!wiphy && !wiphy_dump_id))
-		return;
-
 	if (wiphy) {
 		wiphy->dump_id = 0;
 
+		if (wiphy->dump_queued) {
+			wiphy->dump_queued = false;
+			wiphy_dump_after_regdom(wiphy);
+			return;
+		}
+
+		/* No other dumps queued, emit done */
 		WATCHLIST_NOTIFY(&wiphy->state_watches,
 				wiphy_state_watch_func_t, wiphy,
 				WIPHY_STATE_WATCH_EVENT_REGDOM_DONE);
@@ -2087,6 +2094,12 @@ static void wiphy_dump_done(void *user_data)
 
 	wiphy_dump_id = 0;
 
+	if (dump_queued) {
+		dump_queued = false;
+		wiphy_dump_after_regdom(NULL);
+		return;
+	}
+
 	for (e = l_queue_get_entries(wiphy_list); e; e = e->next) {
 		wiphy = e->data;
 
@@ -2160,38 +2173,48 @@ static void wiphy_dump_callback(struct l_genl_msg *msg,
 	}
 }
 
-static bool wiphy_cancel_last_dump(struct wiphy *wiphy)
+static void wiphy_dump_after_regdom(struct wiphy *wiphy)
 {
-	unsigned int id = 0;
+	const struct l_queue_entry *e;
+	struct l_genl_msg *msg;
+	unsigned int *id = NULL;
 
-	/*
-	 * Zero command ID to signal that wiphy_dump_done doesn't need to do
-	 * anything.
-	 */
-	if (wiphy && wiphy->dump_id) {
-		id = wiphy->dump_id;
-		wiphy->dump_id = 0;
-	} else if (!wiphy && wiphy_dump_id) {
-		id = wiphy_dump_id;
-		wiphy_dump_id = 0;
-	}
+	if (wiphy)
+		id = &wiphy->dump_id;
+	else
+		id = &wiphy_dump_id;
 
-	if (id) {
-		l_debug("Canceling pending regdom wiphy dump (%s)",
+	if (*id) {
+		/*
+		 * If the dump hasn't hit the kernel we can safely cancel it.
+		 * Otherwise set the dump_queued flag and once the current dump
+		 * has completed, another will start. This also ensures there is
+		 * only at most 1 pending dump.
+		 */
+		if (!l_genl_family_request_sent(nl80211, *id)) {
+			l_debug("Canceling pending regdom wiphy dump (%s)",
 					wiphy ? wiphy->name : "global");
 
-		l_genl_family_cancel(nl80211, id);
+			l_genl_family_cancel(nl80211, *id);
+
+			/*
+			 * If there is already dump queued (dump_queued=true)
+			 * the destroy callback will end up starting another
+			 * dump. If this happens nothing more needs to be done.
+			 */
+			if (*id)
+				return;
+		} else if (wiphy) {
+			wiphy->dump_queued = true;
+			return;
+		} else {
+			dump_queued = true;
+			return;
+		}
 	}
 
-	return id != 0;
-}
-
-static void wiphy_dump_after_regdom(struct wiphy *wiphy)
-{
-	const struct l_queue_entry *e;
-	struct l_genl_msg *msg;
-	unsigned int id;
-	bool no_start_event;
+	l_debug("Dumping wiphy after regdom change (%s)",
+			wiphy ? wiphy->name : "global");
 
 	msg = l_genl_msg_new_sized(NL80211_CMD_GET_WIPHY, 128);
 
@@ -2199,37 +2222,22 @@ static void wiphy_dump_after_regdom(struct wiphy *wiphy)
 		l_genl_msg_append_attr(msg, NL80211_ATTR_WIPHY, 4, &wiphy->id);
 
 	l_genl_msg_append_attr(msg, NL80211_ATTR_SPLIT_WIPHY_DUMP, 0, NULL);
-	id = l_genl_family_dump(nl80211, msg, wiphy_dump_callback,
+	*id = l_genl_family_dump(nl80211, msg, wiphy_dump_callback,
 						wiphy, wiphy_dump_done);
-	if (!id) {
+	if (!*id) {
 		l_error("Wiphy information dump failed");
 		l_genl_msg_unref(msg);
 		return;
 	}
 
-	/*
-	 * Another update while dumping wiphy. This next dump should supercede
-	 * the first and not result in a DONE event until this new dump is
-	 * finished. This is because the disabled frequencies are in an unknown
-	 * state and could cause incorrect behavior by any watchers.
-	 */
-	no_start_event = wiphy_cancel_last_dump(wiphy);
-
 	/* Limited dump so just emit the event for this wiphy */
 	if (wiphy) {
-		wiphy->dump_id = id;
-
-		if (no_start_event)
-			return;
-
 		WATCHLIST_NOTIFY(&wiphy->state_watches,
 				wiphy_state_watch_func_t, wiphy,
 				WIPHY_STATE_WATCH_EVENT_REGDOM_STARTED);
 		return;
 	}
 
-	wiphy_dump_id = id;
-
 	/* Otherwise for a global regdom change notify for all wiphy's */
 	for (e = l_queue_get_entries(wiphy_list); e; e = e->next) {
 		struct wiphy *w = e->data;
@@ -2237,9 +2245,6 @@ static void wiphy_dump_after_regdom(struct wiphy *wiphy)
 		if (w->self_managed)
 			continue;
 
-		if (no_start_event)
-			continue;
-
 		WATCHLIST_NOTIFY(&w->state_watches, wiphy_state_watch_func_t,
 				w, WIPHY_STATE_WATCH_EVENT_REGDOM_STARTED);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 1/2] wiphy: fix regdom change wiphy dump logic
  2023-05-10 20:41 [PATCH 0/1] Fix wiphy regdom logic James Prestwood
  2023-05-10 20:41 ` [PATCH 1/1] wiphy: fix regdom change wiphy dump logic James Prestwood
@ 2023-05-10 20:41 ` James Prestwood
  2023-05-10 20:41 ` [PATCH 2/2] station: fix reentrant FT wiphy radio work insertion James Prestwood
  2023-05-10 20:43 ` [PATCH 0/1] Fix wiphy regdom logic James Prestwood
  3 siblings, 0 replies; 5+ messages in thread
From: James Prestwood @ 2023-05-10 20:41 UTC (permalink / raw)
  To: iwd; +Cc: James Prestwood

Certain drivers change the regulatory domain when the client roams.
First XX, then the country the roam target is using. Depending on the
timing the second regdom event can come while the first is still
dumping. This was not handled gracefully/nicely in IWD and it would
just issue another dump (it tries to cancel but if the dump started
there isn't anything that can be done). AFAIK the kernel should be
able to handle this but based on the behavior we saw it seems to not
be able to, or at least not reliably.

When the second dump comes in and GET_WIPHY is issued the kernel
stops replying to any messages which obviously causes IWD to completely
hang up waiting for various acks (set_station, set_key, etc.).

Now if another dump is ongoing (and cant be safely canceled) a flag
will be set when another dump is needed. The first dump will be
waited for and only once completed will another start. This also
ensures there is at most 1 pending dump when before any number of
dumps could be queued which is pointless since only the last dump
request matters.
---
 src/wiphy.c | 103 +++++++++++++++++++++++++++-------------------------
 1 file changed, 54 insertions(+), 49 deletions(-)

diff --git a/src/wiphy.c b/src/wiphy.c
index 2db2d2cd..dbffa44e 100644
--- a/src/wiphy.c
+++ b/src/wiphy.c
@@ -67,6 +67,7 @@ static int mac_randomize_bytes = 6;
 static char regdom_country[2];
 static uint32_t work_ids;
 static unsigned int wiphy_dump_id;
+static bool dump_queued;
 
 enum driver_flag {
 	DEFAULT_IF = 0x1,
@@ -123,6 +124,7 @@ struct wiphy {
 	bool work_in_callback;
 	unsigned int get_reg_id;
 	unsigned int dump_id;
+	bool dump_queued : 1;
 
 	bool support_scheduled_scan:1;
 	bool support_rekey_offload:1;
@@ -2066,18 +2068,23 @@ static void wiphy_setup_rm_enabled_capabilities(struct wiphy *wiphy)
 	 */
 }
 
+static void wiphy_dump_after_regdom(struct wiphy *wiphy);
+
 static void wiphy_dump_done(void *user_data)
 {
 	struct wiphy *wiphy = user_data;
 	const struct l_queue_entry *e;
 
-	/* This dump was canceled due to another dump */
-	if ((wiphy && !wiphy->dump_id) || (!wiphy && !wiphy_dump_id))
-		return;
-
 	if (wiphy) {
 		wiphy->dump_id = 0;
 
+		if (wiphy->dump_queued) {
+			wiphy->dump_queued = false;
+			wiphy_dump_after_regdom(wiphy);
+			return;
+		}
+
+		/* No other dumps queued, emit done */
 		WATCHLIST_NOTIFY(&wiphy->state_watches,
 				wiphy_state_watch_func_t, wiphy,
 				WIPHY_STATE_WATCH_EVENT_REGDOM_DONE);
@@ -2087,6 +2094,12 @@ static void wiphy_dump_done(void *user_data)
 
 	wiphy_dump_id = 0;
 
+	if (dump_queued) {
+		dump_queued = false;
+		wiphy_dump_after_regdom(NULL);
+		return;
+	}
+
 	for (e = l_queue_get_entries(wiphy_list); e; e = e->next) {
 		wiphy = e->data;
 
@@ -2160,38 +2173,48 @@ static void wiphy_dump_callback(struct l_genl_msg *msg,
 	}
 }
 
-static bool wiphy_cancel_last_dump(struct wiphy *wiphy)
+static void wiphy_dump_after_regdom(struct wiphy *wiphy)
 {
-	unsigned int id = 0;
+	const struct l_queue_entry *e;
+	struct l_genl_msg *msg;
+	unsigned int *id = NULL;
 
-	/*
-	 * Zero command ID to signal that wiphy_dump_done doesn't need to do
-	 * anything.
-	 */
-	if (wiphy && wiphy->dump_id) {
-		id = wiphy->dump_id;
-		wiphy->dump_id = 0;
-	} else if (!wiphy && wiphy_dump_id) {
-		id = wiphy_dump_id;
-		wiphy_dump_id = 0;
-	}
+	if (wiphy)
+		id = &wiphy->dump_id;
+	else
+		id = &wiphy_dump_id;
 
-	if (id) {
-		l_debug("Canceling pending regdom wiphy dump (%s)",
+	if (*id) {
+		/*
+		 * If the dump hasn't hit the kernel we can safely cancel it.
+		 * Otherwise set the dump_queued flag and once the current dump
+		 * has completed, another will start. This also ensures there is
+		 * only at most 1 pending dump.
+		 */
+		if (!l_genl_family_request_sent(nl80211, *id)) {
+			l_debug("Canceling pending regdom wiphy dump (%s)",
 					wiphy ? wiphy->name : "global");
 
-		l_genl_family_cancel(nl80211, id);
+			l_genl_family_cancel(nl80211, *id);
+
+			/*
+			 * If there is already dump queued (dump_queued=true)
+			 * the destroy callback will end up starting another
+			 * dump. If this happens nothing more needs to be done.
+			 */
+			if (*id)
+				return;
+		} else if (wiphy) {
+			wiphy->dump_queued = true;
+			return;
+		} else {
+			dump_queued = true;
+			return;
+		}
 	}
 
-	return id != 0;
-}
-
-static void wiphy_dump_after_regdom(struct wiphy *wiphy)
-{
-	const struct l_queue_entry *e;
-	struct l_genl_msg *msg;
-	unsigned int id;
-	bool no_start_event;
+	l_debug("Dumping wiphy after regdom change (%s)",
+			wiphy ? wiphy->name : "global");
 
 	msg = l_genl_msg_new_sized(NL80211_CMD_GET_WIPHY, 128);
 
@@ -2199,37 +2222,22 @@ static void wiphy_dump_after_regdom(struct wiphy *wiphy)
 		l_genl_msg_append_attr(msg, NL80211_ATTR_WIPHY, 4, &wiphy->id);
 
 	l_genl_msg_append_attr(msg, NL80211_ATTR_SPLIT_WIPHY_DUMP, 0, NULL);
-	id = l_genl_family_dump(nl80211, msg, wiphy_dump_callback,
+	*id = l_genl_family_dump(nl80211, msg, wiphy_dump_callback,
 						wiphy, wiphy_dump_done);
-	if (!id) {
+	if (!*id) {
 		l_error("Wiphy information dump failed");
 		l_genl_msg_unref(msg);
 		return;
 	}
 
-	/*
-	 * Another update while dumping wiphy. This next dump should supercede
-	 * the first and not result in a DONE event until this new dump is
-	 * finished. This is because the disabled frequencies are in an unknown
-	 * state and could cause incorrect behavior by any watchers.
-	 */
-	no_start_event = wiphy_cancel_last_dump(wiphy);
-
 	/* Limited dump so just emit the event for this wiphy */
 	if (wiphy) {
-		wiphy->dump_id = id;
-
-		if (no_start_event)
-			return;
-
 		WATCHLIST_NOTIFY(&wiphy->state_watches,
 				wiphy_state_watch_func_t, wiphy,
 				WIPHY_STATE_WATCH_EVENT_REGDOM_STARTED);
 		return;
 	}
 
-	wiphy_dump_id = id;
-
 	/* Otherwise for a global regdom change notify for all wiphy's */
 	for (e = l_queue_get_entries(wiphy_list); e; e = e->next) {
 		struct wiphy *w = e->data;
@@ -2237,9 +2245,6 @@ static void wiphy_dump_after_regdom(struct wiphy *wiphy)
 		if (w->self_managed)
 			continue;
 
-		if (no_start_event)
-			continue;
-
 		WATCHLIST_NOTIFY(&w->state_watches, wiphy_state_watch_func_t,
 				w, WIPHY_STATE_WATCH_EVENT_REGDOM_STARTED);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] station: fix reentrant FT wiphy radio work insertion
  2023-05-10 20:41 [PATCH 0/1] Fix wiphy regdom logic James Prestwood
  2023-05-10 20:41 ` [PATCH 1/1] wiphy: fix regdom change wiphy dump logic James Prestwood
  2023-05-10 20:41 ` [PATCH 1/2] " James Prestwood
@ 2023-05-10 20:41 ` James Prestwood
  2023-05-10 20:43 ` [PATCH 0/1] Fix wiphy regdom logic James Prestwood
  3 siblings, 0 replies; 5+ messages in thread
From: James Prestwood @ 2023-05-10 20:41 UTC (permalink / raw)
  To: iwd; +Cc: James Prestwood

The failure path for FT was inserting the same work item inside
the 'do_work' callback. This caused duplicate work items to be
inside the radio work queue, and upon destroy the ID would get
zero'ed out. This caused a host of valgrind complaints when this
behavior is triggered.

This can be seen with slight modifiations to the FT test:

src/station.c:station_try_next_transition() 9, target 12:00:00:00:00:02
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 6
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 7
src/wiphy.c:wiphy_radio_work_done() Work item 5 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 6
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 6 done

--- Work item 7 is 'station_ft_work_ready()' ---

src/wiphy.c:wiphy_radio_work_next() Starting work item 7
src/station.c:station_debug_event() StationDebug.Event(ft-roam-failed)
src/station.c:station_try_next_transition() 9, target 12:00:00:00:00:03
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 8

--- Here we insert work item 9, which is a duplicate of 7 ---

src/wiphy.c:wiphy_radio_work_insert() Inserting work item 9
src/wiphy.c:wiphy_radio_work_next() Starting work item 8
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 8 done

--- And finally, the duplicated work item starts ---

src/wiphy.c:wiphy_radio_work_next() Starting work item 0
src/station.c:station_debug_event() StationDebug.Event(ft-roam-failed)
src/station.c:station_roam_failed() 9
src/station.c:station_roam_trigger_cb() 9
---
 src/station.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/station.c b/src/station.c
index 0d8f7628..3c6e43ce 100644
--- a/src/station.c
+++ b/src/station.c
@@ -122,6 +122,7 @@ struct station {
 
 	uint32_t wiphy_watch;
 
+	struct l_idle *ft_idle;
 	struct wiphy_radio_work_item ft_work;
 
 	bool preparing_roam : 1;
@@ -1739,6 +1740,11 @@ static void station_roam_state_clear(struct station *station)
 		station->roam_freqs = NULL;
 	}
 
+	if (station->ft_idle) {
+		l_idle_remove(station->ft_idle);
+		station->ft_idle = NULL;
+	}
+
 	l_queue_clear(station->roam_bss_list, l_free);
 
 	ft_clear_authentications(netdev_get_ifindex(station->netdev));
@@ -2282,6 +2288,16 @@ static void station_preauthenticate_cb(struct netdev *netdev,
 
 static void station_transition_start(struct station *station);
 
+static void station_transition_idle(struct l_idle *idle, void *user_data)
+{
+	struct station *station = user_data;
+
+	l_idle_remove(station->ft_idle);
+	station->ft_idle = NULL;
+
+	station_transition_start(station);
+}
+
 static bool station_ft_work_ready(struct wiphy_radio_work_item *item)
 {
 	struct station *station = l_container_of(item, struct station, ft_work);
@@ -2300,7 +2316,8 @@ static bool station_ft_work_ready(struct wiphy_radio_work_item *item)
 	if (ret == -ENOENT) {
 		station_debug_event(station, "ft-roam-failed");
 try_next:
-		station_transition_start(station);
+		station->ft_idle = l_idle_create(station_transition_idle,
+							station, NULL);
 		return true;
 	} else if (ret < 0)
 		goto assoc_failed;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/1] Fix wiphy regdom logic
  2023-05-10 20:41 [PATCH 0/1] Fix wiphy regdom logic James Prestwood
                   ` (2 preceding siblings ...)
  2023-05-10 20:41 ` [PATCH 2/2] station: fix reentrant FT wiphy radio work insertion James Prestwood
@ 2023-05-10 20:43 ` James Prestwood
  3 siblings, 0 replies; 5+ messages in thread
From: James Prestwood @ 2023-05-10 20:43 UTC (permalink / raw)
  To: iwd

Sorry, something got screwed up and included additional patches. Please 
disregard.

On 5/10/23 1:41 PM, James Prestwood wrote:
> Below are some logs when this behavior was seen. IWD never
> transitions out of the ft-roaming state due to the key setting
> callbacks never being called. Its hard to know exactly what
> messages were sent/received since its coming from iwmon but
> netlink appears to completely lock up until a watchdog restarts
> the IWD service.
> 
> After the patch was applied there were zero instances of this
> behavior which seems to indicate the kernel is unable to handle
> multiple GET_WIPHY calls.
> 
> May 10 00:11:54 iwd[336880]: src/station.c:station_try_next_transition() 14, target aa:46:8d:1c:16:e6
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 14498
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 14499
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_done() Work item 14497 done
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_next() Starting work item 14498
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
> May 10 00:11:54 iwd[336880]: src/ft.c:ft_send_authenticate()
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
> May 10 00:11:54 kernel: wlan0: disconnect from AP aa:46:8d:1c:17:25 for new assoc to aa:46:8d:1c:16:e6
> May 10 00:11:54 locus[917]: State changed to roaming
> May 10 00:11:54 locus[917]: Start 10000ms timeout waiting for connected/roaming
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_unicast_notify() Unicast notification Frame(59)
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_done() Work item 14498 done
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_radio_work_next() Starting work item 14499
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_cqm_rssi_update()
> May 10 00:11:54 iwd[336880]: src/station.c:station_enter_state() Old State: connected, new state: ft-roaming
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Frame Wait Cancel(67)
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_deauthenticate_event()
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_disconnect_event()
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
> May 10 00:11:54 kernel: wlan0: associate with aa:46:8d:1c:16:e6 (try 1/3)
> May 10 00:11:54 kernel: wlan0: RX ReassocResp from aa:46:8d:1c:16:e6 (capab=0x1511 status=0 aid=7)
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
> May 10 00:11:54 systemd-networkd[393]: wlan0: Lost carrier
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_associate_event()
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_set_gtk() ifindex=14 key_idx=2
> May 10 00:11:54 iwd[336880]: src/station.c:station_handshake_event() Setting keys
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_set_tk() ifindex=14 key_idx=0
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_connect_event()
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_connect_event() aborting and ignore_connect_event not set, proceed
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_connect_event() expect_connect_failure not set, proceed
> May 10 00:11:54 iwd[336880]: src/netdev.c:parse_request_ies()
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_connect_event() Request / Response IEs parsed
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_get_oci()
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
> May 10 00:11:54 iwd[336880]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is US
> May 10 00:11:54 iwd[336880]: src/wiphy.c:wiphy_cancel_last_dump() Canceling pending regdom wiphy dump (global)
> May 10 00:11:54 kernel: wlan0: associated
> May 10 00:11:54 kernel: ath: EEPROM regdomain: 0x8348
> May 10 00:11:54 kernel: ath: EEPROM indicates we should expect a country code
> May 10 00:11:54 kernel: ath: doing EEPROM country->regdmn map search
> May 10 00:11:54 kernel: ath: country maps to regdmn code: 0x3a
> May 10 00:11:54 kernel: ath: Country alpha2 being used: US
> May 10 00:11:54 kernel: ath: Regpair used: 0x3a
> May 10 00:11:54 kernel: ath: regdomain 0x8348 dynamically updated by country element
> May 10 00:11:55 iwd[336880]: src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
> May 10 00:11:55 iwd[336880]: src/netdev.c:netdev_cqm_event() Signal change event (above=0 signal=-76)
> 
> --- Nothing here for 10 seconds, until the watchdog triggers ---
> 
> May 10 00:12:04 locus[917]: IWD watchdog triggered (reason=state)
> May 10 00:12:04 iwd[336880]: Terminate
> May 10 00:12:04 iwd[336880]: src/netdev.c:netdev_free() Freeing netdev wlan0[14]
> May 10 00:12:04 iwd[336880]: src/device.c:device_free()
> May 10 00:12:04 iwd[336880]: src/station.c:station_free()
> May 10 00:12:04 iwd[336880]: src/netconfig.c:netconfig_destroy()
> May 10 00:12:04 iwd[336880]: src/netconfig.c:netconfig_event_handler() l_netconfig event 2
> May 10 00:12:04 iwd[336880]: src/netconfig-commit.c:netconfig_commit_print_addrs() removing address: 10.193.158.118
> May 10 00:12:04 iwd[336880]: src/resolve.c:resolve_systemd_revert() ifindex: 14
> May 10 00:12:04 systemd[1]: Stopping Wireless service...
> 
> The iwmon logs are lengthy, but I've included the bit where the
> logs abruptly stop for 10 seconds until the watchdog kicks in.
> Prior there was no key setting or anything connection related
> after the connect event.
> 
> May 10 00:11:55:     Extended Capabilities: len 8
> May 10 00:11:55:         Capability: bit  2: Extended channel switching
> May 10 00:11:55:         Capability: bit 62: Opmode Notification
> May 10 00:11:55:     Extended Capabilities Mask: 170 len 8
> May 10 00:12:04:         04 00 00 00 00 00 00 40                          .......@
> May 10 00:12:04:     VHT Capability Mask: 176 len 12
> May 10 00:12:04:         f0 1f 80 33 ff ff 00 00 ff ff 00 00              ...3........
> May 10 00:12:04:     MAC Address 00:0E:8E:A3:02:FE
> May 10 00:12:04: > Result: New Wiphy (0x03) len 36 [multi]                     1683677515.942843
> May 10 00:12:04:     Wiphy: 0 (0x00000000)
> May 10 00:12:04:     Wiphy Name: phy0
> May 10 00:12:04:     Generation: 1 (0x00000001)
> May 10 00:12:04:     Max AP Assoc Station: 202 len 4
> May 10 00:12:04:         20 00 00 00
> 
> James Prestwood (1):
>    wiphy: fix regdom change wiphy dump logic
> 
>   src/wiphy.c | 103 +++++++++++++++++++++++++++-------------------------
>   1 file changed, 54 insertions(+), 49 deletions(-)
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-10 20:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-10 20:41 [PATCH 0/1] Fix wiphy regdom logic James Prestwood
2023-05-10 20:41 ` [PATCH 1/1] wiphy: fix regdom change wiphy dump logic James Prestwood
2023-05-10 20:41 ` [PATCH 1/2] " James Prestwood
2023-05-10 20:41 ` [PATCH 2/2] station: fix reentrant FT wiphy radio work insertion James Prestwood
2023-05-10 20:43 ` [PATCH 0/1] Fix wiphy regdom logic James Prestwood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).