linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Hillf Danton <hdanton@sina.com>
Cc: Dmitry Vyukov <dvyukov@google.com>,
	syzbot <syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com>,
	linux-kernel@vger.kernel.org, paulmck@kernel.org,
	syzkaller-bugs@googlegroups.com,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
Date: Tue, 14 Sep 2021 16:58:08 +0200	[thread overview]
Message-ID: <87v933b3wf.ffs@tglx> (raw)
In-Reply-To: <20210914123726.4219-1-hdanton@sina.com>

On Tue, Sep 14 2021 at 20:37, Hillf Danton wrote:

> On Mon, 13 Sep 2021 12:28:14 +0200 Thomas Gleixner wrote:
>>On Tue, Aug 31 2021 at 15:45, Hillf Danton wrote:
>>> On Mon, 30 Aug 2021 12:58:58 +0200 Dmitry Vyukov wrote:
>>>>>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
>>>>>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
>>>>>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
>>>>>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
>>>>>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
>>>>>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
>>>
>>> Add debug info only to help kasan catch the timer running longer than 2 ticks.
>>>
>>> Is it anything in the right direction, tglx?
>>
>>Not really. As Dmitry pointed out this seems to be related to
>
> Thanks for taking a look.
>
>>mac80211_hwsim and if you look at the above stacktrace then how is
>>adding something to the timer wheel helpful?
>
> Given the stall was printed on CPU1 while the supposedly offending timer was
> expiring on CPU0, what was proposed is the lame debug info only for kasan to
> catch the timer red handed.
>
> It is more appreciated if the tglx dude would likely spend a couple of minutes
> giving us a lesson on the expertises needed for collecting evidence that any
> timer runs longer than two ticks. It helps beyond the extent of kasan.

That tglx dude already picked the relevant part of the stack trace (see
also above):

>>>>>  ieee80211_iterate_active_interfaces_atomic+0x70/0x180 net/mac80211/util.c:829
>>>>>  mac80211_hwsim_beacon+0xd5/0x1a0 drivers/net/wireless/mac80211_hwsim.c:1861
>>>>>  __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
>>>>>  __hrtimer_run_queues+0x609/0xe50 kernel/time/hrtimer.c:1601
>>>>>  hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1618
>>>>>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

and then asked the question how a timer wheel timer runtime check
helps. He just omitted the appendix "if the timer in question is a
hrtimer" as he assumed that this is pretty obvious from the stack trace.

Aside of that if the wireless timer callback runs in an endless loop,
what is a runtime detection of that in the hrtimer softirq invocation
helping to decode the problem if the stall detector catches it when it
hangs there?

Now that mac80211 hrtimer callback might actually be not the real
problem. It's certainly containing a bunch of loops, but I couldn't find
an endless loop there during a cursory inspection.

But that callback does rearm the hrtimer and that made me look at
hrtimer_run_queues() which might be the reason for the endless loop as
it only terminates when there is no timer to expire anymore.

Now what happens when the mac80211 callback rearms the timer so it
expires immediately again:

        hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
                        ns_to_ktime(bcn_int * NSEC_PER_USEC));

bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
which would certainly cause the loop in hrtimer_run_queues() to keeping
looping forever.

That should be easy to verify by implementing a simple test which
reschedules a hrtimer from the callback with a expiry time close to now.

Not today as I'm about to head home to fire up the pizza oven.

Thanks,

        tglx

  parent reply	other threads:[~2021-09-14 14:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-28  4:52 [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode syzbot
2021-08-30 10:58 ` Dmitry Vyukov
     [not found] ` <20210831074532.2255-1-hdanton@sina.com>
2021-09-13 10:28   ` Thomas Gleixner
     [not found]   ` <20210914123726.4219-1-hdanton@sina.com>
2021-09-14 14:58     ` Thomas Gleixner [this message]
2021-09-14 18:00       ` Dmitry Vyukov
2021-09-14 18:31         ` Paul E. McKenney
2021-09-15  9:36           ` Thomas Gleixner
2021-09-15  8:57         ` Thomas Gleixner
2021-09-15  9:14           ` Dmitry Vyukov
2021-09-15  9:32             ` Thomas Gleixner
2021-09-16  9:24               ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v933b3wf.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=dvyukov@google.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).