stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <regressions@leemhuis.info>
To: Luca Coelho <luciano.coelho@intel.com>,
	Johannes Berg <johannes.berg@intel.com>
Cc: "regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	"linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	Stephane Poignant <stephane.poignant@proton.ch>
Subject: Re: Regression in 5.10.67: "iwlwifi: pcie: free RBs during configure" causes rx lockups with BAR_FRAME_RELEASE on AX200/AX201 when using 802.11ax
Date: Fri, 18 Mar 2022 15:38:47 +0100	[thread overview]
Message-ID: <4fb68d44-0d4f-df16-21b8-3d85ebe0aadc@leemhuis.info> (raw)
In-Reply-To: <9e4ea11e-7d00-d2c4-7f80-862f0cbe96db@leemhuis.info>

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

FYI: looks like this isn't a regression, as Stephane mentioned in a
comment to the bko report:
https://bugzilla.kernel.org/show_bug.cgi?id=215660#c13

> So today i could reproduce on 5.10.46 after a few days of testing. It does not look like a regression.

Thus removing it from the regression tracking:

#regzbot invalid: After further testing it does not look like a
regression anymore

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.


On 14.03.22 12:44, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker.
> 
> I noticed a regression report in bugzilla.kernel.org that afaics nobody
> acted upon since it was reported more than ten days ago (it afaifcs only
> later became clear this is a regression), that's why I decided to
> forward it to the lists and a few relevant people to the CC. To quote
> from https://bugzilla.kernel.org/show_bug.cgi?id=215660:
> 
>>  Stephane Poignant 2022-03-04 17:24:49 UTC
>>
>> Created attachment 300529 [details]
>> lspci and ethtool outputs on reproducing systems
>>
>> Context:
>> - dense enterprise deployment, 10 lightweight aps (Aruba) on one office floor, up to 125 concurrent users total, up to 25 user per AP
>> - the wireless network supports 802.11n, 802.11ac and 802.11ax in 5 GHz band
>> - authentication is wpa2-psk
>> - client devices consists in a variety of endpoints (laptops, cell phones, tablets, smart devices), running various versions of Mac OSX, Linux, Windows, Android or IOS.
>> - certain clients supports only 20Mhz, HT protection kicks in and turns off on APs as those clients are moving around. Consequently ht_operation_mode fluctuates between 4 and 6 even when staying on the same AP.
>> - the issue affects various laptops with Intel AX200 or AX201 chipsets, running Debian or Ubuntu with a recent kernel >= 5.10
>> - see attached file devices.txt for detailed information on the different laptops we have reproduced the issue on
>>
>>
>> Steps to reproduce:
>> - appears sometimes, but not always, after the iwlwifi STA roams from one AP to another
>> - seen more often when ht_operation_mode changes between 4 and 6 (but not sufficient to trigger the issue)
>> - STA deassociates from current AP and associates to the new one successfully
>> - connectivity works on the new AP for a short period of time, usually between 30s and 1 minute
>> - then suddenly, the Rx path breaks. No more received frame visible on the STA wireless interface. AP reports that frames are retransmitted and not acknowledged by STA.
>> - the Tx path keeps working. Frames sent by STA to AP are received and visible on the network
>> - in this state each inbound frame appears to trigger iwl_pcie_rx_handle_rb with cmd BAR_FRAME_RELEASE (seqnum is always the same):
>>
>> Mar  4 12:44:32 debian kernel: [15884.715812] iwlwifi 0000:00:14.3: iwl_pcie_rx_handle Q 0: HW = 338, SW = 337
>> Mar  4 12:44:32 debian kernel: [15884.715819] iwlwifi 0000:00:14.3: iwl_pcie_get_rxb Got virtual RB ID 1348
>> Mar  4 12:44:32 debian kernel: [15884.715831] iwlwifi 0000:00:14.3: iwl_pcie_rx_handle_rb Q 0: cmd at offset 0: BAR_FRAME_RELEASE (00.c2, seq 0xbfff)
>> Mar  4 12:44:32 debian kernel: [15884.715838] iwlwifi 0000:00:14.3: iwl_mvm_release_frames_from_notif Frame release notification for BAID 14, NSSN 169
>> Mar  4 12:44:32 debian kernel: [15884.715843] iwlwifi 0000:00:14.3: iwl_pcie_rx_handle_rb Q 0: RB end marker at offset 64
>> Mar  4 12:44:32 debian kernel: [15884.715852] iwlwifi 0000:00:14.3: iwl_pcie_restock_bd Assigned virtual RB ID 1348 to queue 0 index 334
>>
>> - those events do not appear during normal operation (or very rarely)
>>
>>
>> Temporary resolution:
>> - in most cases, the STA remains in this state until Wifi is restarted or until it roams to another AP
>> - while in that state, it may happens (rarely) that a few frame are received with very high latency, then the next ones are lost, for instance:
>>
>> [1646398334.114200] From 10.200.2.67 icmp_seq=148 Destination Host Unreachable
>> [1646398334.114242] From 10.200.2.67 icmp_seq=149 Destination Host Unreachable
>> [1646398334.114251] From 10.200.2.67 icmp_seq=150 Destination Host Unreachable
>> [1646398336.365181] 64 bytes from 10.200.2.1: icmp_seq=151 ttl=64 time=2251 ms
>> [1646398336.365237] 64 bytes from 10.200.2.1: icmp_seq=152 ttl=64 time=1227 ms
>> [1646398336.365250] 64 bytes from 10.200.2.1: icmp_seq=153 ttl=64 time=203 ms
>> [1646398375.042236] From 10.200.2.67 icmp_seq=188 Destination Host Unreachable
>> [1646398375.042291] From 10.200.2.67 icmp_seq=189 Destination Host Unreachable
>> [1646398375.042303] From 10.200.2.67 icmp_seq=190 Destination Host Unreachable
>>
>>
>> Workaround:
>> - disable_11ax=1 prevents the problem from happening
>> [...]
> 
>>  Stephane Poignant 2022-03-10 14:48:39 UTC
>>
>> Did some further testing with vanilla kernel.
>> 5.10.66 and older DO NOT reproduce the issue.
>> 5.10.67 and newer DO reproduce.
>>
>> I see the following changes according to changelog:
>> iwlwifi: mvm: Fix scan channel flags settings
>> iwlwifi: fw: correctly limit to monitor dump
>> iwlwifi: mvm: fix access to BSS elements
>> iwlwifi: mvm: avoid static queue number aliasing
>> iwlwifi: mvm: fix a memory leak in iwl_mvm_mac_ctxt_beacon_changed
>> iwlwifi: pcie: free RBs during configure
>>
>> Suspecting the one related with queues but no strong opinion atm.
>>
>> [reply] [−] Comment 6 Stephane Poignant 2022-03-11 10:18:29 UTC
>>
>> Ok so after some further testing, turned out that after commenting the following lines in file drivers/net/wireless/intel/iwlwifi/pcie/trans.c:
>>
>> 	/* free all first - we might be reconfigured for a different size */
>> 	iwl_pcie_free_rbs_pool(trans);
>>
>> Which were introduced by the following commit:
>> iwlwifi: pcie: free RBs during configure
>> https://lore.kernel.org/all/iwlwifi.20210802170640.42d7c93279c4.I07f74e65aab0e3d965a81206fcb289dc92d74878@changeid/
>>
>> Then i'm no longer able to reproduce. Tested in vanilla 5.10.67, vanilla 5.10.88 and 5.10.92 with Debian patches.
>>
> 
> Could somebody take a look into this? Or was this discussed somewhere
> else already? Or even fixed?
> 
> Anyway, to get this tracked:
> 
> #regzbot introduced: 608c8359c567b4a04dedbe
> #regzbot from: Stephane Poignant <stephane.poignant@proton.ch>
> #regzbot title: wireless: iwlwifi: regression in 5.10.67 due to
> "iwlwifi: pcie: free RBs during configure"
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215660
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.
> 

      reply	other threads:[~2022-03-18 14:38 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-14 11:44 Regression in 5.10.67: "iwlwifi: pcie: free RBs during configure" causes rx lockups with BAR_FRAME_RELEASE on AX200/AX201 when using 802.11ax Thorsten Leemhuis
2022-03-18 14:38 ` Thorsten Leemhuis [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fb68d44-0d4f-df16-21b8-3d85ebe0aadc@leemhuis.info \
    --to=regressions@leemhuis.info \
    --cc=johannes.berg@intel.com \
    --cc=linux-wireless@vger.kernel.org \
    --cc=luciano.coelho@intel.com \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=stephane.poignant@proton.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).