From: Linus Torvalds <torvalds@linux-foundation.org>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>,
Kalle Valo <kvalo@codeaurora.org>,
linux-wireless@vger.kernel.org, Netdev <netdev@vger.kernel.org>,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>
Subject: Re: WARNING at net/mac80211/sta_info.c:1057 (__sta_info_destroy_part2())
Date: Wed, 11 Sep 2019 12:58:29 +0100 [thread overview]
Message-ID: <CAHk-=wj_jneK+UYzHhjwsH0XxP0knM+2o2OeFVEz-FjuQ77-ow@mail.gmail.com> (raw)
In-Reply-To: <feecebfcceba521703f13c8ee7f5bb9016924cb6.camel@sipsolutions.net>
On Wed, Sep 11, 2019 at 11:26 AM Johannes Berg
<johannes@sipsolutions.net> wrote:
>
> Hi,
>
> > So I'm at LCA
>
> When did LCA move to Portugal? ;-))
Heh. I may be jetlagged and not thinking straight. LCA/LPC ;)
> > Previous resume looks normal:
> > [snip]
> > wlp2s0: Limiting TX power to 23 (23 - 0) dBm as advertised by
> > 54:ec:2f:05:70:2c
>
> Is that the message you meant?
>
> > I say _almost_, because I don't see the "No TX power limit" for the
> > country lookup (yes, Portugal) this time?
>
> because here you had it too, just a bit earlier. It usually comes when a
> beacon is received the first time, which depends on the AP timing.
Duh. I'm blind and didn't notice, because I was expecting it in the same order.
And I didn't think about it or double-check, because the errors that
then followed later _looked_ like that TX power failing that I thought
hadn't happened.
> I don't _think_ any of the above would be a reason to disconnect, but it
> clearly looks like the device got stuck at this point, since everything
> just fails afterwards.
Yeah, maybe the power stuff was just another effect of things getting
stuck, rather than the reason for it getting stuck.
So I probably mis-attributed the cause.
> > WARNING: CPU: 4 PID: 1246 at net/mac80211/sta_info.c:1057
> > __sta_info_destroy_part2+0x147/0x150 [mac80211]
>
> Not really a surprise. Perhaps we shouldn't even WARN_ON() this, if the
> driver is stuck completely and returning errors to everything that
> doesn't help so much.
>
> Then again, the stack trace was helpful this time:
>
> > ieee80211_set_disassoc+0xc2/0x590 [mac80211]
> > ieee80211_mgd_deauth.cold+0x4a/0x1b8 [mac80211]
> > cfg80211_mlme_deauth+0xb3/0x1d0 [cfg80211]
> > cfg80211_mlme_down+0x66/0x90 [cfg80211]
> > cfg80211_disconnect+0x129/0x1e0 [cfg80211]
> > cfg80211_leave+0x27/0x40 [cfg80211]
> > cfg80211_netdev_notifier_call+0x1a7/0x4e0 [cfg80211]
> > notifier_call_chain+0x4c/0x70
> > __dev_close_many+0x57/0x100
> > dev_close_many+0x8d/0x140
> > dev_close.part.0+0x44/0x70
> > cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211]
> > cfg80211_rfkill_set_block+0x22/0x30 [cfg80211]
> > rfkill_set_block+0x92/0x140 [rfkill]
> > rfkill_fop_write+0x11f/0x1c0 [rfkill]
> > vfs_write+0xb6/0x1a0
>
>
> Since we see that something actually did an rfkill operation. Did you
> push a button there?
No, I tried to turn off and turn on Wifi manually (no button, just the
settings panel).
I didn't notice the WARN_ON(), I just noticed that there was no
networking, and "turn it off and on again" is obviously the first
thing to try ;)
> You don't happen to have timing information on these logs, perhaps
> recorded in the logfile/journal?
Sure. I cleaned up the logs to not spam people with lots of illegible
data, but it's all in the journal log.
Rough timeline:
Sep 11 03:40:00 xps13 kernel: PM: suspend entry (s2idle)
Sep 11 03:40:00 xps13 kernel: Filesystems sync: 0.028 seconds
...
Sep 11 10:13:14 xps13 kernel: Restarting tasks ... done.
Sep 11 10:13:14 xps13 kernel: PM: suspend exit
Sep 11 10:13:14 xps13 kernel: ath10k_pci 0000:02:00.0: UART prints enabled
Sep 11 10:13:14 xps13 kernel: ath10k_pci 0000:02:00.0: unsupported HTC
service id: 1536
Sep 11 10:13:23 xps13 kernel: wlp2s0: authenticate with 54:ec:2f:05:70:2c
Sep 11 10:13:23 xps13 kernel: wlp2s0: send auth to 54:ec:2f:05:70:2c (try 1/3)
Sep 11 10:13:23 xps13 kernel: wlp2s0: authenticated
Sep 11 10:13:23 xps13 kernel: wlp2s0: Limiting TX power to 23 (23 - 0)
dBm as advertised by 54:ec:2f:05:70:2c
...
Sep 11 10:13:23 xps13 kernel: ath: regdomain 0x826c dynamically
updated by country element
Sep 11 10:13:24 xps13 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0:
link becomes ready
Sep 11 10:27:07 xps13 kernel: ath10k_pci 0000:02:00.0: wmi command
16387 timeout, restarting hardware
...
Sep 11 10:27:07 xps13 kernel: ath10k_pci 0000:02:00.0: failed to read
hi_board_data address: -16
Sep 11 10:27:10 xps13 kernel: ath10k_pci 0000:02:00.0: failed to
receive initialized event from target: 00000000
Sep 11 10:27:13 xps13 kernel: ath10k_pci 0000:02:00.0: failed to
receive initialized event from target: 00000000
...
Sep 11 10:27:13 xps13 kernel: WARNING: CPU: 4 PID: 1246 at
net/mac80211/sta_info.c:1057 __sta_info_destroy_part2+0x147/0x150
[mac80211]
but if you want full logs I can send them in private to you.
I do suspect it's atheros and suspend/resume or something. The
wireless clearly worked for a while after the resume, but then at some
point it stopped.
> It seems odd to me, since the RTNL is acquired by
> cfg80211_rfkill_set_block() and that doesn't even have an error path, it
> just does
> rtnl_lock();
> cfg80211_shutdown_all_interfaces(&rdev->wiphy);
> rtnl_unlock();
>
> The only explanation I therefore have is that something is just taking
> *forever* in that code path, hence my question about timing information
> on the logs.
Yeah, maybe it would time out everything eventually. But not for a
long time. It hadn't cleared up by
Sep 11 10:36:21 xps13 gnome-session-f[6837]: gnome-session-failed:
Fatal IO error 0 (Success) on X server :0.
which is when I shut down the machine (and had to then force a hard
power-off because the shutdown wanted things that needed the rtnl_lock
to go away)
Linus
next prev parent reply other threads:[~2019-09-11 11:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-11 10:05 WARNING at net/mac80211/sta_info.c:1057 (__sta_info_destroy_part2()) Linus Torvalds
2019-09-11 10:26 ` Johannes Berg
2019-09-11 11:58 ` Linus Torvalds [this message]
2019-09-11 12:04 ` Johannes Berg
2019-09-11 13:03 ` Ben Greear
2019-09-11 13:21 ` Linus Torvalds
2019-09-11 13:31 ` Ben Greear
2019-09-11 13:32 ` Kalle Valo
2019-09-11 18:19 ` Kalle Valo
2019-09-11 18:23 ` Johannes Berg
2019-09-11 18:48 ` Kalle Valo
2019-09-11 18:10 ` Kalle Valo
2019-10-20 15:12 ` Tomislav Požega
2019-10-21 14:01 ` Ben Greear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wj_jneK+UYzHhjwsH0XxP0knM+2o2OeFVEz-FjuQ77-ow@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=johannes@sipsolutions.net \
--cc=kvalo@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).