netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>,
	Kalle Valo <kvalo@codeaurora.org>,
	linux-wireless@vger.kernel.org, Netdev <netdev@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>
Subject: Re: WARNING at net/mac80211/sta_info.c:1057 (__sta_info_destroy_part2())
Date: Wed, 11 Sep 2019 12:58:29 +0100	[thread overview]
Message-ID: <CAHk-=wj_jneK+UYzHhjwsH0XxP0knM+2o2OeFVEz-FjuQ77-ow@mail.gmail.com> (raw)
In-Reply-To: <feecebfcceba521703f13c8ee7f5bb9016924cb6.camel@sipsolutions.net>

On Wed, Sep 11, 2019 at 11:26 AM Johannes Berg
<johannes@sipsolutions.net> wrote:
>
> Hi,
>
> > So I'm at LCA
>
> When did LCA move to Portugal? ;-))

Heh. I may be jetlagged and not thinking straight.  LCA/LPC ;)

> > Previous resume looks normal:
> > [snip]
> >    wlp2s0: Limiting TX power to 23 (23 - 0) dBm as advertised by
> > 54:ec:2f:05:70:2c
>
> Is that the message you meant?
>
> > I say _almost_, because I don't see the "No TX power limit" for the
> > country lookup (yes, Portugal) this time?
>
> because here you had it too, just a bit earlier. It usually comes when a
> beacon is received the first time, which depends on the AP timing.

Duh. I'm blind and didn't notice, because I was expecting it in the same order.

And I didn't think about it or double-check, because the errors that
then followed later _looked_ like that TX power failing that I thought
hadn't happened.

> I don't _think_ any of the above would be a reason to disconnect, but it
> clearly looks like the device got stuck at this point, since everything
> just fails afterwards.

Yeah, maybe the power stuff was just another effect of things getting
stuck, rather than the reason for it getting stuck.

So I probably mis-attributed the cause.

> >    WARNING: CPU: 4 PID: 1246 at net/mac80211/sta_info.c:1057
> > __sta_info_destroy_part2+0x147/0x150 [mac80211]
>
> Not really a surprise. Perhaps we shouldn't even WARN_ON() this, if the
> driver is stuck completely and returning errors to everything that
> doesn't help so much.
>
> Then again, the stack trace was helpful this time:
>
> >     ieee80211_set_disassoc+0xc2/0x590 [mac80211]
> >     ieee80211_mgd_deauth.cold+0x4a/0x1b8 [mac80211]
> >     cfg80211_mlme_deauth+0xb3/0x1d0 [cfg80211]
> >     cfg80211_mlme_down+0x66/0x90 [cfg80211]
> >     cfg80211_disconnect+0x129/0x1e0 [cfg80211]
> >     cfg80211_leave+0x27/0x40 [cfg80211]
> >     cfg80211_netdev_notifier_call+0x1a7/0x4e0 [cfg80211]
> >     notifier_call_chain+0x4c/0x70
> >     __dev_close_many+0x57/0x100
> >     dev_close_many+0x8d/0x140
> >     dev_close.part.0+0x44/0x70
> >     cfg80211_shutdown_all_interfaces+0x71/0xd0 [cfg80211]
> >     cfg80211_rfkill_set_block+0x22/0x30 [cfg80211]
> >     rfkill_set_block+0x92/0x140 [rfkill]
> >     rfkill_fop_write+0x11f/0x1c0 [rfkill]
> >     vfs_write+0xb6/0x1a0
>
>
> Since we see that something actually did an rfkill operation. Did you
> push a button there?

No, I tried to turn off and turn on Wifi manually (no button, just the
settings panel).

I didn't notice the WARN_ON(), I just noticed that there was no
networking, and "turn it off and on again" is obviously the first
thing to try ;)

> You don't happen to have timing information on these logs, perhaps
> recorded in the logfile/journal?

Sure. I cleaned up the logs to not spam people with lots of illegible
data, but it's all in the journal log.

Rough timeline:

Sep 11 03:40:00 xps13 kernel: PM: suspend entry (s2idle)
Sep 11 03:40:00 xps13 kernel: Filesystems sync: 0.028 seconds
...
Sep 11 10:13:14 xps13 kernel: Restarting tasks ... done.
Sep 11 10:13:14 xps13 kernel: PM: suspend exit
Sep 11 10:13:14 xps13 kernel: ath10k_pci 0000:02:00.0: UART prints enabled
Sep 11 10:13:14 xps13 kernel: ath10k_pci 0000:02:00.0: unsupported HTC
service id: 1536
Sep 11 10:13:23 xps13 kernel: wlp2s0: authenticate with 54:ec:2f:05:70:2c
Sep 11 10:13:23 xps13 kernel: wlp2s0: send auth to 54:ec:2f:05:70:2c (try 1/3)
Sep 11 10:13:23 xps13 kernel: wlp2s0: authenticated
Sep 11 10:13:23 xps13 kernel: wlp2s0: Limiting TX power to 23 (23 - 0)
dBm as advertised by 54:ec:2f:05:70:2c
...
Sep 11 10:13:23 xps13 kernel: ath: regdomain 0x826c dynamically
updated by country element
Sep 11 10:13:24 xps13 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0:
link becomes ready
Sep 11 10:27:07 xps13 kernel: ath10k_pci 0000:02:00.0: wmi command
16387 timeout, restarting hardware
...
Sep 11 10:27:07 xps13 kernel: ath10k_pci 0000:02:00.0: failed to read
hi_board_data address: -16
Sep 11 10:27:10 xps13 kernel: ath10k_pci 0000:02:00.0: failed to
receive initialized event from target: 00000000
Sep 11 10:27:13 xps13 kernel: ath10k_pci 0000:02:00.0: failed to
receive initialized event from target: 00000000
...
Sep 11 10:27:13 xps13 kernel: WARNING: CPU: 4 PID: 1246 at
net/mac80211/sta_info.c:1057 __sta_info_destroy_part2+0x147/0x150
[mac80211]

but if you want full logs I can send them in private to you.

I do suspect it's atheros and suspend/resume or something. The
wireless clearly worked for a while after the resume, but then at some
point it stopped.

> It seems odd to me, since the RTNL is acquired by
> cfg80211_rfkill_set_block() and that doesn't even have an error path, it
> just does
>         rtnl_lock();
>         cfg80211_shutdown_all_interfaces(&rdev->wiphy);
>         rtnl_unlock();
>
> The only explanation I therefore have is that something is just taking
> *forever* in that code path, hence my question about timing information
> on the logs.

Yeah, maybe it would time out everything eventually. But not for a
long time. It hadn't cleared up by

  Sep 11 10:36:21 xps13 gnome-session-f[6837]: gnome-session-failed:
Fatal IO error 0 (Success) on X server :0.

which is when I shut down the machine (and had to then force a hard
power-off because the shutdown wanted things that needed the rtnl_lock
to go away)

                 Linus

  reply	other threads:[~2019-09-11 11:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11 10:05 WARNING at net/mac80211/sta_info.c:1057 (__sta_info_destroy_part2()) Linus Torvalds
2019-09-11 10:26 ` Johannes Berg
2019-09-11 11:58   ` Linus Torvalds [this message]
2019-09-11 12:04     ` Johannes Berg
2019-09-11 13:03       ` Ben Greear
2019-09-11 13:21         ` Linus Torvalds
2019-09-11 13:31           ` Ben Greear
2019-09-11 13:32       ` Kalle Valo
2019-09-11 18:19   ` Kalle Valo
2019-09-11 18:23     ` Johannes Berg
2019-09-11 18:48       ` Kalle Valo
2019-09-11 18:10 ` Kalle Valo
2019-10-20 15:12   ` Tomislav Požega
2019-10-21 14:01     ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wj_jneK+UYzHhjwsH0XxP0knM+2o2OeFVEz-FjuQ77-ow@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=johannes@sipsolutions.net \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).