linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Wetzel <alexander@wetzel-home.de>
To: Linux regressions mailing list <regressions@lists.linux.dev>
Cc: "linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Mann <rauchwolke@gmx.net>,
	Stanislaw Gruszka <stf_xl@wp.pl>,
	Helmut Schaa <helmut.schaa@googlemail.com>,
	Johannes Berg <johannes.berg@intel.com>
Subject: Re: [Regression] rt2800usb - Wifi performance issues and connection drops
Date: Tue, 7 Mar 2023 21:54:31 +0100	[thread overview]
Message-ID: <debc7fe9-204d-63a7-aa61-91b20a46f385@wetzel-home.de> (raw)
In-Reply-To: <6025e17e-4c29-6d36-6b9c-2fec543b21c4@wetzel-home.de>

>>
> 
> I just uploaded a test patch to bugzilla.
> Please have a look if that fixes the issue.
> 
> If not I would be interested in the output of your iTXQ status.
> Enable CONFIG_MAC80211_DEBUGFS and run this command when the connection 
> is bad and send/share/upload to bugzilla the resulting debug.out:
> 
> k=1; while [ $k -lt 10 ]; do \
> cat /sys/kernel/debug/ieee80211/phy?/netdev:*/stations/*/aqm; \
> k=$(($k+1)); done >> debug.out

Thomas and I continued with some debugging in
https://bugzilla.kernel.org/show_bug.cgi?id=217119

But the results so far are unexpected and we decided to continue the 
debugging with the round here. Hoping someone sees something I miss.

A very summary where we are:
I can't reproduce the bug with a very similar card and kernel config so 
far. Thomas card stops the iTXQs for intervalls >30s. Mine operates 
normally.

A more useful but longer summary:

Thomas updated to a 6.2 kernel and reported "connection drops and 
bandwidth problems" with his rt2800usb wlan card. (6.1 is ok.) Asked for 
some more details he reported:
"...slow bandwidth stuff works better, but the main problem/test case is 
to start a 8-16 mbit video stream, which sometimes runs for a few 
seconds and then stops or it doesn't start at all"

He bisected the issue and identified my commit 4444bc2116ae ("wifi: 
mac80211: Proper mark iTXQs for resumption") as culprit.

Checking the internal iTXQ status when the issue is ongoing shows, that 
TID zero is flagged as dirty and thus is not transmitting queued 
packets. Interesting line from 
/sys/kernel/debug/ieee80211/phy?/netdev:*/stations/*/aqm:
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit 
collisions tx-bytes tx-packets flags
0 2 619736 404 1681 0 0 0 1 4513965 3019 0xe(RUN AMPDU NO-AMSDU DIRTY)

--> The "normal" iTXQ handling IEEE80211_AC_BE has queued packets and is 
flagged as DIRTY. There even is a potential race setting the DIRTY flag, 
but the fix for that is not helping.

Thus Thomas applied two debug patches, to better understand why the 
DIRTY flag is not cleared.

And looking at the output from those we see that the driver stops Tx by 
calling ieee80211_stop_queue(). When ieee80211_wake_queue() mac80211 
correctly resumes TX but is getting stopped by the driver after a single 
packet again. (The start of the relevant log is missing, so that may be 
initially more).
I assume TX is still ok at that stage. But after some singe Tx 
operations the driver stops the queues again. Here the relevant part of 
the log:
[  179.584997] XXXX __ieee80211_wake_txqs: waking TID 0
[  179.585022] XXXX drv_tx: TX
[  179.585027] XXXX ieee80211_stop_queue: called
[  179.585028] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
[  179.585030] XXXX __ieee80211_wake_txqs: TID 3 NOT dirty
[  179.585031] XXXX __ieee80211_wake_txqs: TID 8 NOT dirty
[  179.585033] XXXX __ieee80211_wake_txqs: TID 11 NOT dirty
[  179.585034] XXXX __ieee80211_wake_txqs: EXIT
[  179.585035] XXXX __ieee80211_wake_txqs: ENTRY
[  179.585036] XXXX __ieee80211_wake_txqs: TID 1 NOT dirty
[  179.585037] XXXX __ieee80211_wake_txqs: TID 2 NOT dirty
[  179.585038] XXXX __ieee80211_wake_txqs: TID 9 NOT dirty
[  179.585040] XXXX __ieee80211_wake_txqs: TID 10 NOT dirty
[  179.585041] XXXX __ieee80211_wake_txqs: EXIT
[  179.585047] XXXX drv_tx: TX
[  179.585056] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
[  179.585271] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
[  179.585868] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
[  179.586120] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
[  179.586544] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
[  179.586792] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
[  179.587317] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
[  179.587591] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
[  179.588569] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
....
[  214.307617] XXXX ieee80211_wake_queue: called


--> So the driver blocked TX for more than 30s. Which is a good 
explanation of what Thomas observes.

But there is nothing mac80211 can do differently here. Whatever is the 
real reason for the issue, it's nothing obvious I see.

Luckily I found a card using the same driver and nearly the same card:
Thomas systems:Linux version 6.2.2-gentoo (root@foo) (gcc (Gentoo 
Hardened 12.2.1_p20230121-r1 p10) 12.2.1 20230121, GNU ld (Gentoo 2.39 
p5) 2.39.0) #2 SMP Fri Mar  3 16:59:02 CET 2023ieee80211 phy0: 
rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 0005 detected
ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'

My system, using the kernel config from Thomas with only minor 
modifications (different filesystems and initramfs settings and enabled 
mac80211 debug and developer options):
Linux version 6.2.2-gentoo (root@Perry.mordor) (gcc (Gentoo 
12.2.1_p20230121-r1 p10) 12.2.1 20230121, GNU ld (Gentoo 2.40 p2) 
2.40.0) #2 SMP Tue Mar  7 18:18:47 CET 2023ieee80211 phy0: 
rt2x00_set_rt: Info - RT chipset 3070, rev 0200 detected
ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 0005 detected
ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 
'rt2870.bin'
ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - 
version: 0.36

But there is one big difference on my system: I can't reproduce the bug 
so far. It's working as it should... (I did not apply the debug patches 
myself so far)

I'm now planning to look a bit more into the rt2800usb driver and 
provide another debug patch for interesting looking code pieces in it.

@Thomas:
I've also uploaded you my binary kernel I'm running at the moment here:
https://www.awhome.eu/s/5FjqMS73rtCtSBM

That kernel should also be able to boot and operate your system. Can you 
try that and tell me, if that makes any difference?

I'm also planning to provide some more debug patches, to figuring out 
which part of commit 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs 
for resumption") fixes the issue for you. Assuming my understanding 
above is correct the patch should not really fix/break anything for 
you...With the findings above I would have expected your git bisec to 
identify commit a790cc3a4fad ("wifi: mac80211: add wake_tx_queue 
callback to drivers") as the first broken commit...

Alexander

  reply	other threads:[~2023-03-07 20:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-04 16:24 [Regression] rt2800usb - Wifi performance issues and connection drops Linux regression tracking (Thorsten Leemhuis)
2023-03-05 17:25 ` Thorsten Leemhuis
2023-03-05 22:05   ` Alexander Wetzel
2023-03-07 20:54     ` Alexander Wetzel [this message]
2023-03-07 22:31       ` Thomas Mann
2023-03-08  7:13         ` Alexander Wetzel
2023-03-08 10:26           ` Thomas Mann
2023-03-08 12:10             ` Alexander Wetzel
2023-03-08 12:29             ` Thomas Mann
2023-03-08  7:52       ` Felix Fietkau
2023-03-08 11:41         ` Alexander Wetzel
2023-03-08 11:57           ` Felix Fietkau
2023-03-08 12:21             ` Linux regression tracking (Thorsten Leemhuis)
2023-03-08 16:50               ` Alexander Wetzel
2023-03-09  7:59                 ` Linux regression tracking (Thorsten Leemhuis)
2023-03-09 22:13             ` Alexander Wetzel
2023-03-11 21:26               ` Alexander Wetzel
2023-03-12  8:58                 ` Felix Fietkau
2023-03-09 17:00       ` Alexander Wetzel
2023-03-09 17:29         ` Thomas Mann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=debc7fe9-204d-63a7-aa61-91b20a46f385@wetzel-home.de \
    --to=alexander@wetzel-home.de \
    --cc=helmut.schaa@googlemail.com \
    --cc=johannes.berg@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=rauchwolke@gmx.net \
    --cc=regressions@lists.linux.dev \
    --cc=stf_xl@wp.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).