All of lore.kernel.org
 help / color / mirror / Atom feed
* AP + STA on DFS channel breaks DFS detection.
@ 2022-04-27 21:32 Ben Greear
  2022-04-27 23:35 ` Ben Greear
  2022-05-10 14:31 ` Johannes Berg
  0 siblings, 2 replies; 4+ messages in thread
From: Ben Greear @ 2022-04-27 21:32 UTC (permalink / raw)
  To: linux-wireless

I am using 5.17.4+ kernel, MT7915 radios.  One radio (wiphy0) is acting as AP on
channel 132.  It starts, does CAC and starts working fine.

Then, I bring up a station on wiphy1 (on same machine).  The STA connects to the AP
on wiphy0 and starts running traffic for a short time (usually < 1 minute).  And then
the AP gets stopped.  I don't think this is specific to connecting AP to STA on same machine,
probably if STA connected to another AP on channel 132 it would have same issue.

I think I have tracked this down by adding prints and WARN_ON to find
the interesting state changes.  It looks like when the STA changes its
regdom (probably because it is admin-up and/or associated to the AP), then the state of the
channel's dfs_state is reset.  Channel objects are per band, not per wiphy.

And then a bit later, a timer kicks off and decides that CAC has not completed
(because it already completed earlier on the AP, but chan->dfs_state was lost,
and STA will not do CAC anyway.)

So, question is, how in the world to fix this properly!

2818 Apr 27 14:20:42 lf0350-9634 kernel: cfg80211: handle-single-rule: wiphy0 chan: 00000000d9f5550d  old_state: 2  new: DFS_USABLE
2819 Apr 27 14:20:42 lf0350-9634 kernel: ------------[ cut here ]------------
2820 Apr 27 14:20:42 lf0350-9634 kernel: WARNING: CPU: 1 PID: 75 at net/wireless/reg.c:1830 wiphy_update_regulatory.cold.32+0x3ba/0x796 [cfg80211]
2821 Apr 27 14:20:42 lf0350-9634 kernel: Modules linked in: nf_conntrack_netlink nfnetlink iptable_raw xt_CT nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
libcrc32c          bpfilter vrf 8021q garp mrp stp llc macvlan pktgen rpcrdma rdma_cm iw_cm ib_cm ib_core pcengines_apuv2 gpio_keys_polled leds_gpio 
gpio_amd_fch amd64_edac             edac_mce_amd kvm_amd mt7915e kvm irqbypass bfq mt76_connac_lib mt76 mac80211 fam15h_power k10temp i2c_piix4 cfg80211 
acpi_cpufreq nfsd auth_rpcgss nfs_acl            sch_fq_codel lockd grace drm fuse agpgart sunrpc zram sdhci_pci cqhci sdhci igb hwmon i2c_algo_bit mmc_core 
sp5100_tco dca xhci_pci xhci_pci_renesas ccp              i2c_core [last unloaded: nfnetlink]
2822 Apr 27 14:20:42 lf0350-9634 kernel: CPU: 1 PID: 75 Comm: kworker/1:1 Tainted: G        W         5.17.4+ #14
2823 Apr 27 14:20:42 lf0350-9634 kernel: Hardware name: PC Engines APU2/APU2, BIOS 4.0.7 02/28/2017
2824 Apr 27 14:20:42 lf0350-9634 kernel: Workqueue: events reg_regdb_apply [cfg80211]
2825 Apr 27 14:20:42 lf0350-9634 kernel: RIP: 0010:wiphy_update_regulatory.cold.32+0x3ba/0x796 [cfg80211]
2826 Apr 27 14:20:42 lf0350-9634 kernel: sta03000: Limiting TX power to 20 (23 - 3) dBm as advertised by 00:0a:52:06:9f:10
2827 Apr 27 14:20:42 lf0350-9634 kernel: Code: 48 8b b3 c8 01 00 00 48 85 f6 0f 84 a8 00 00 00 41 8b 4e 2c 4c 89 f2 48 c7 c7 20 10 51 a0 e8 7e 70 6d e1 41 83 7e 
2c        02 75 02 <0f> 0b 48 8b 05 b8 25 30 e2 8b 4c 24 24 41 c7 46 2c 00 00 00 00 41
2828 Apr 27 14:20:42 lf0350-9634 kernel: RSP: 0018:ffffc90000297dc0 EFLAGS: 00010246
2829 Apr 27 14:20:42 lf0350-9634 kernel: RAX: 000000000000005a RBX: ffff88810bca03a0 RCX: 0000000000000027
2830 Apr 27 14:20:42 lf0350-9634 kernel: RDX: 0000000000000000 RSI: ffff88811ac9c440 RDI: ffff88811ac9c448
2831 Apr 27 14:20:42 lf0350-9634 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff
2832 Apr 27 14:20:42 lf0350-9634 kernel: R10: 0000000000000001 R11: ffffc90000297bf0 R12: ffff8880141a399c
2833 Apr 27 14:20:42 lf0350-9634 kernel: R13: ffff88810c3803a0 R14: ffff88810ac00c28 R15: 0000000000565d60
2834 Apr 27 14:20:42 lf0350-9634 kernel: FS:  0000000000000000(0000) GS:ffff88811ac80000(0000) knlGS:0000000000000000
2835 Apr 27 14:20:42 lf0350-9634 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2836 Apr 27 14:20:42 lf0350-9634 kernel: CR2: 00005601e502dba0 CR3: 000000000bf36000 CR4: 00000000000406e0
2837 Apr 27 14:20:42 lf0350-9634 kernel: Call Trace:
2838 Apr 27 14:20:42 lf0350-9634 kernel:  <TASK>
2839 Apr 27 14:20:42 lf0350-9634 kernel:  update_all_wiphy_regulatory+0x2e/0x90 [cfg80211]
2840 Apr 27 14:20:42 lf0350-9634 kernel:  set_regdom+0x101/0x420 [cfg80211]
2841 Apr 27 14:20:42 lf0350-9634 kernel:  reg_regdb_apply+0x65/0x90 [cfg80211]
2842 Apr 27 14:20:42 lf0350-9634 kernel:  process_one_work+0x21a/0x3f0
2843 Apr 27 14:20:42 lf0350-9634 kernel:  ? process_one_work+0x3f0/0x3f0
2844 Apr 27 14:20:42 lf0350-9634 kernel:  worker_thread+0x28/0x3a0
2845 Apr 27 14:20:42 lf0350-9634 kernel:  ? process_one_work+0x3f0/0x3f0
2846 Apr 27 14:20:42 lf0350-9634 kernel:  kthread+0xd2/0x100
2847 Apr 27 14:20:42 lf0350-9634 kernel:  ? kthread_complete_and_exit+0x20/0x20
2848 Apr 27 14:20:42 lf0350-9634 kernel:  ret_from_fork+0x1f/0x30
2849 Apr 27 14:20:42 lf0350-9634 kernel:  </TASK>
2850 Apr 27 14:20:42 lf0350-9634 kernel: ---[ end trace 0000000000000000 ]---

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: AP + STA on DFS channel breaks DFS detection.
  2022-04-27 21:32 AP + STA on DFS channel breaks DFS detection Ben Greear
@ 2022-04-27 23:35 ` Ben Greear
  2022-05-10 14:31 ` Johannes Berg
  1 sibling, 0 replies; 4+ messages in thread
From: Ben Greear @ 2022-04-27 23:35 UTC (permalink / raw)
  To: linux-wireless

I figure this is a horrible enough patch to increase my sin by top-posting.
Just FYI, this 'fixes' it on my system.  Something better is needed probably,
but after staring at this all day, this is the best I have:

iff --git a/net/wireless/reg.c b/net/wireless/reg.c
index e1d3705a166e0..00955d11901a9 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -1825,8 +1825,22 @@ static void handle_channel_single_rule(struct wiphy *wiphy,
                 return;
         }

-       chan->dfs_state = NL80211_DFS_USABLE;
-       chan->dfs_state_entered = jiffies;
+       /* HACK:  Work around problem where you have AP on DFS channel and then
+        * STA on different radio connects on same channel.  That causes regdom to change
+        * (or the code isn't smart enough to realize it didn't really change),
+        * because STA gets regdom from its AP, causing CAC to restart,
+        * which kills the AP interface before CAC can ever be finished.
+        * This is the one path that hits in my system, there are other places that may
+        * need latching too, and/or there is probably a better way to fix this.
+        * --Ben
+        */
+       if (chan->dfs_state != NL80211_DFS_AVAILABLE) {
+               chan->dfs_state = NL80211_DFS_USABLE;
+               chan->dfs_state_entered = jiffies;
+       } else {
+               pr_info("wiphy %s %pM: freq %d.%03d MHz: NOT setting DFS state back to baseline in single_rule, leave it latched at DFS_AVAILABLE.\n",
+                       dev_name(&wiphy->dev), wiphy->perm_addr, chan->center_freq, chan->freq_offset);
+       }

         chan->beacon_found = false;
         chan->flags = flags | bw_flags | map_regdom_flags(reg_rule->flags);

Thanks,
Ben

On 4/27/22 2:32 PM, Ben Greear wrote:
> I am using 5.17.4+ kernel, MT7915 radios.  One radio (wiphy0) is acting as AP on
> channel 132.  It starts, does CAC and starts working fine.
> 
> Then, I bring up a station on wiphy1 (on same machine).  The STA connects to the AP
> on wiphy0 and starts running traffic for a short time (usually < 1 minute).  And then
> the AP gets stopped.  I don't think this is specific to connecting AP to STA on same machine,
> probably if STA connected to another AP on channel 132 it would have same issue.
> 
> I think I have tracked this down by adding prints and WARN_ON to find
> the interesting state changes.  It looks like when the STA changes its
> regdom (probably because it is admin-up and/or associated to the AP), then the state of the
> channel's dfs_state is reset.  Channel objects are per band, not per wiphy.
> 
> And then a bit later, a timer kicks off and decides that CAC has not completed
> (because it already completed earlier on the AP, but chan->dfs_state was lost,
> and STA will not do CAC anyway.)
> 
> So, question is, how in the world to fix this properly!
> 
> 2818 Apr 27 14:20:42 lf0350-9634 kernel: cfg80211: handle-single-rule: wiphy0 chan: 00000000d9f5550d  old_state: 2  new: DFS_USABLE
> 2819 Apr 27 14:20:42 lf0350-9634 kernel: ------------[ cut here ]------------
> 2820 Apr 27 14:20:42 lf0350-9634 kernel: WARNING: CPU: 1 PID: 75 at net/wireless/reg.c:1830 wiphy_update_regulatory.cold.32+0x3ba/0x796 [cfg80211]
> 2821 Apr 27 14:20:42 lf0350-9634 kernel: Modules linked in: nf_conntrack_netlink nfnetlink iptable_raw xt_CT nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
> libcrc32c          bpfilter vrf 8021q garp mrp stp llc macvlan pktgen rpcrdma rdma_cm iw_cm ib_cm ib_core pcengines_apuv2 gpio_keys_polled leds_gpio 
> gpio_amd_fch amd64_edac             edac_mce_amd kvm_amd mt7915e kvm irqbypass bfq mt76_connac_lib mt76 mac80211 fam15h_power k10temp i2c_piix4 cfg80211 
> acpi_cpufreq nfsd auth_rpcgss nfs_acl            sch_fq_codel lockd grace drm fuse agpgart sunrpc zram sdhci_pci cqhci sdhci igb hwmon i2c_algo_bit mmc_core 
> sp5100_tco dca xhci_pci xhci_pci_renesas ccp              i2c_core [last unloaded: nfnetlink]
> 2822 Apr 27 14:20:42 lf0350-9634 kernel: CPU: 1 PID: 75 Comm: kworker/1:1 Tainted: G        W         5.17.4+ #14
> 2823 Apr 27 14:20:42 lf0350-9634 kernel: Hardware name: PC Engines APU2/APU2, BIOS 4.0.7 02/28/2017
> 2824 Apr 27 14:20:42 lf0350-9634 kernel: Workqueue: events reg_regdb_apply [cfg80211]
> 2825 Apr 27 14:20:42 lf0350-9634 kernel: RIP: 0010:wiphy_update_regulatory.cold.32+0x3ba/0x796 [cfg80211]
> 2826 Apr 27 14:20:42 lf0350-9634 kernel: sta03000: Limiting TX power to 20 (23 - 3) dBm as advertised by 00:0a:52:06:9f:10
> 2827 Apr 27 14:20:42 lf0350-9634 kernel: Code: 48 8b b3 c8 01 00 00 48 85 f6 0f 84 a8 00 00 00 41 8b 4e 2c 4c 89 f2 48 c7 c7 20 10 51 a0 e8 7e 70 6d e1 41 83 7e 
> 2c        02 75 02 <0f> 0b 48 8b 05 b8 25 30 e2 8b 4c 24 24 41 c7 46 2c 00 00 00 00 41
> 2828 Apr 27 14:20:42 lf0350-9634 kernel: RSP: 0018:ffffc90000297dc0 EFLAGS: 00010246
> 2829 Apr 27 14:20:42 lf0350-9634 kernel: RAX: 000000000000005a RBX: ffff88810bca03a0 RCX: 0000000000000027
> 2830 Apr 27 14:20:42 lf0350-9634 kernel: RDX: 0000000000000000 RSI: ffff88811ac9c440 RDI: ffff88811ac9c448
> 2831 Apr 27 14:20:42 lf0350-9634 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff
> 2832 Apr 27 14:20:42 lf0350-9634 kernel: R10: 0000000000000001 R11: ffffc90000297bf0 R12: ffff8880141a399c
> 2833 Apr 27 14:20:42 lf0350-9634 kernel: R13: ffff88810c3803a0 R14: ffff88810ac00c28 R15: 0000000000565d60
> 2834 Apr 27 14:20:42 lf0350-9634 kernel: FS:  0000000000000000(0000) GS:ffff88811ac80000(0000) knlGS:0000000000000000
> 2835 Apr 27 14:20:42 lf0350-9634 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2836 Apr 27 14:20:42 lf0350-9634 kernel: CR2: 00005601e502dba0 CR3: 000000000bf36000 CR4: 00000000000406e0
> 2837 Apr 27 14:20:42 lf0350-9634 kernel: Call Trace:
> 2838 Apr 27 14:20:42 lf0350-9634 kernel:  <TASK>
> 2839 Apr 27 14:20:42 lf0350-9634 kernel:  update_all_wiphy_regulatory+0x2e/0x90 [cfg80211]
> 2840 Apr 27 14:20:42 lf0350-9634 kernel:  set_regdom+0x101/0x420 [cfg80211]
> 2841 Apr 27 14:20:42 lf0350-9634 kernel:  reg_regdb_apply+0x65/0x90 [cfg80211]
> 2842 Apr 27 14:20:42 lf0350-9634 kernel:  process_one_work+0x21a/0x3f0
> 2843 Apr 27 14:20:42 lf0350-9634 kernel:  ? process_one_work+0x3f0/0x3f0
> 2844 Apr 27 14:20:42 lf0350-9634 kernel:  worker_thread+0x28/0x3a0
> 2845 Apr 27 14:20:42 lf0350-9634 kernel:  ? process_one_work+0x3f0/0x3f0
> 2846 Apr 27 14:20:42 lf0350-9634 kernel:  kthread+0xd2/0x100
> 2847 Apr 27 14:20:42 lf0350-9634 kernel:  ? kthread_complete_and_exit+0x20/0x20
> 2848 Apr 27 14:20:42 lf0350-9634 kernel:  ret_from_fork+0x1f/0x30
> 2849 Apr 27 14:20:42 lf0350-9634 kernel:  </TASK>
> 2850 Apr 27 14:20:42 lf0350-9634 kernel: ---[ end trace 0000000000000000 ]---
> 
> Thanks,
> Ben
> 


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: AP + STA on DFS channel breaks DFS detection.
  2022-04-27 21:32 AP + STA on DFS channel breaks DFS detection Ben Greear
  2022-04-27 23:35 ` Ben Greear
@ 2022-05-10 14:31 ` Johannes Berg
  2022-05-10 19:17   ` Ben Greear
  1 sibling, 1 reply; 4+ messages in thread
From: Johannes Berg @ 2022-05-10 14:31 UTC (permalink / raw)
  To: Ben Greear, linux-wireless

On Wed, 2022-04-27 at 14:32 -0700, Ben Greear wrote:
> I am using 5.17.4+ kernel, MT7915 radios.  One radio (wiphy0) is acting as AP on
> channel 132.  It starts, does CAC and starts working fine.
> 
> Then, I bring up a station on wiphy1 (on same machine).  The STA connects to the AP
> on wiphy0 and starts running traffic for a short time (usually < 1 minute).  And then
> the AP gets stopped.  I don't think this is specific to connecting AP to STA on same machine,
> probably if STA connected to another AP on channel 132 it would have same issue.
> 
> I think I have tracked this down by adding prints and WARN_ON to find
> the interesting state changes.  It looks like when the STA changes its
> regdom (probably because it is admin-up and/or associated to the AP), then the state of the
> channel's dfs_state is reset.  Channel objects are per band, not per wiphy.

Actually, they are per wiphy, unless the driver sets up something else?
I couldn't really figure out the code there, but it looked dynamically
allocated, so not sure...

> And then a bit later, a timer kicks off and decides that CAC has not completed
> (because it already completed earlier on the AP, but chan->dfs_state was lost,
> and STA will not do CAC anyway.)
> 
> So, question is, how in the world to fix this properly!
> 

Good question!

But I'm not sure your description of this is quite right - the point
isn't that the channels are shared, the point is that you're getting to
update_all_wiphy_regulatory() which does update _all_ of the devices,
since you've just switched the regulatory domain.

I guess if the rules are identical on a given channel before/after the
regdomain switch, we might get away with not resetting the dfs_state?

johannes

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: AP + STA on DFS channel breaks DFS detection.
  2022-05-10 14:31 ` Johannes Berg
@ 2022-05-10 19:17   ` Ben Greear
  0 siblings, 0 replies; 4+ messages in thread
From: Ben Greear @ 2022-05-10 19:17 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

On 5/10/22 7:31 AM, Johannes Berg wrote:
> On Wed, 2022-04-27 at 14:32 -0700, Ben Greear wrote:
>> I am using 5.17.4+ kernel, MT7915 radios.  One radio (wiphy0) is acting as AP on
>> channel 132.  It starts, does CAC and starts working fine.
>>
>> Then, I bring up a station on wiphy1 (on same machine).  The STA connects to the AP
>> on wiphy0 and starts running traffic for a short time (usually < 1 minute).  And then
>> the AP gets stopped.  I don't think this is specific to connecting AP to STA on same machine,
>> probably if STA connected to another AP on channel 132 it would have same issue.
>>
>> I think I have tracked this down by adding prints and WARN_ON to find
>> the interesting state changes.  It looks like when the STA changes its
>> regdom (probably because it is admin-up and/or associated to the AP), then the state of the
>> channel's dfs_state is reset.  Channel objects are per band, not per wiphy.
> 
> Actually, they are per wiphy, unless the driver sets up something else?
> I couldn't really figure out the code there, but it looked dynamically
> allocated, so not sure...
> 
>> And then a bit later, a timer kicks off and decides that CAC has not completed
>> (because it already completed earlier on the AP, but chan->dfs_state was lost,
>> and STA will not do CAC anyway.)
>>
>> So, question is, how in the world to fix this properly!
>>
> 
> Good question!
> 
> But I'm not sure your description of this is quite right - the point
> isn't that the channels are shared, the point is that you're getting to
> update_all_wiphy_regulatory() which does update _all_ of the devices,
> since you've just switched the regulatory domain.

It is a tangled mess of code, but if I understand it properly, the channel objects
*are* shared between wiphy devices, and...not sure it really matters either way,
since logically the regdom stuff is combined and then the values that are actually
used are the subset that is supported by all regdom configuration?  Maybe there
are special cases to that for things like iwlwifi that does its own regdom
stuff too...but I was only using mtk radios int his particular test case.

> 
> I guess if the rules are identical on a given channel before/after the
> regdomain switch, we might get away with not resetting the dfs_state?

Yes, and it seems very painful to properly make that determination the way
the regdom code is currently implemented:  I could not see a clean way to grab
a 'before' and 'after' snapshot to compare against.  During the regdom rebuild process,
the regdom changes with each new regdom input applied, so you cannot really check
incremental changes to detect a change in the end result.

Thanks,
Ben

> 
> johannes
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-10 19:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27 21:32 AP + STA on DFS channel breaks DFS detection Ben Greear
2022-04-27 23:35 ` Ben Greear
2022-05-10 14:31 ` Johannes Berg
2022-05-10 19:17   ` Ben Greear

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.