From: Dmitry Vyukov <dvyukov@google.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Hillf Danton <hdanton@sina.com>,
syzbot <syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com>,
linux-kernel@vger.kernel.org, paulmck@kernel.org,
syzkaller-bugs@googlegroups.com,
Peter Zijlstra <peterz@infradead.org>,
kasan-dev <kasan-dev@googlegroups.com>,
Johannes Berg <johannes.berg@intel.com>,
Kalle Valo <kvalo@codeaurora.org>,
linux-wireless@vger.kernel.org
Subject: Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
Date: Wed, 15 Sep 2021 11:14:06 +0200 [thread overview]
Message-ID: <CACT4Y+avKp8LCS8vBdaFLXFNcNiCq3vF-8K59o7c1oy86v-ADA@mail.gmail.com> (raw)
In-Reply-To: <87mtoeb4hb.ffs@tglx>
On Wed, 15 Sept 2021 at 10:57, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Sep 14 2021 at 20:00, Dmitry Vyukov wrote:
> > On Tue, 14 Sept 2021 at 16:58, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> Now what happens when the mac80211 callback rearms the timer so it
> >> expires immediately again:
> >>
> >> hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
> >> ns_to_ktime(bcn_int * NSEC_PER_USEC));
> >>
> >> bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
> >> which would certainly cause the loop in hrtimer_run_queues() to keeping
> >> looping forever.
> >>
> >> That should be easy to verify by implementing a simple test which
> >> reschedules a hrtimer from the callback with a expiry time close to now.
> >>
> >> Not today as I'm about to head home to fire up the pizza oven.
> >
> > This question definitely shouldn't take priority over the pizza. But I
> > think I saw this "rearm a timer with a user-controlled value without
> > any checks" pattern lots of times and hangs are inherently harder to
> > localize and reproduce. So I wonder if it makes sense to add a debug
> > config that would catch such cases right when the timer is set up
> > (issue a WARNING)?
>
> Yes and no. It's hard to differentiate between a valid short expiry
> rearm and something which is caused by unchecked values. I have some
> ideas but all of them are expensive and therefore probably debug
> only. Which is actually better than nothing :)
>
> > However, for automated testing there is the usual question of
> > balancing between false positives and false negatives. The check
> > should not produce false positives, but at the same time it should
> > catch [almost] all actual stalls so that they don't manifest as
> > duplicate stall reports.
>
> Right. The problem could be even there with checked values:
>
> start_timer(1ms)
> timer_expires()
> callback()
> forward_timer(timer, now, period(1ms));
>
> which might be perfectly fine with a production kernel as it leaves
> enough time to make overall progress.
>
> Now with a full debug kernel with all bells and whistels that callback
> might just run into this situation:
>
> start_timer(1ms) T0
> timer_expires() T1
> callback()
> do_stuff()
> forward_timer(timer, TNOW, period(1ms));
>
>
> T1 - T0 = 1.001ms
> TNOW - T1 = 0.998 ms
>
> So the forward will just rearm it to T0 + 2ms which means it expires in
> 1us.
>
> > If I understand it correctly the timer is not actually set up as
> > periodic, but rather each callback invocation arms it again. Setting
> > up a timer for 1 ns _once_ (or few times) is probably fine (right?),
> > so the check needs to be somewhat more elaborate and detect "infinite"
> > rearming.
>
> Yes.
>
> That made me actually look at that mac80211_hwsim callback again.
>
> hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
> ns_to_ktime(bcn_int * NSEC_PER_USEC));
>
> So what this does is really wrong because it tries to schedule the timer
> on the theoretical periodic timeline. Which goes really south once the
> timer is late or the callback execution took longer than the
> period. Hypervisors scheduling out a VCPU at the wrong place will do
> that for you nicely.
Nice!
You mentioned that hrtimer_run_queues() may not return. Does it mean
that it can just loop executing the same re-armed callback again and
again? Maybe then the debug check condition should be that
hrtimer_run_queues() runs the same callback more than N times w/o
returning?
> What this actually should use is hrtimer_forward_now() which prevents
> that problem because it will forward the timer in the periodic schedule
> beyond now. That won't prevent the above corner case, but I doubt you
> can create an endless loop with that scenario as easy as you can with
> trying to catch up on your theoretical timeline by using the previous
> expiry time as a base for the forward. Patch below.
>
> /me goes off to audit hrtimer_forward() usage. Sigh...
>
> After that figure out ways to debug or even prevent this. More sigh...
>
> Thanks,
>
> tglx
> ---
> drivers/net/wireless/mac80211_hwsim.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/drivers/net/wireless/mac80211_hwsim.c
> +++ b/drivers/net/wireless/mac80211_hwsim.c
> @@ -1867,8 +1867,8 @@ mac80211_hwsim_beacon(struct hrtimer *ti
> bcn_int -= data->bcn_delta;
> data->bcn_delta = 0;
> }
> - hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
> - ns_to_ktime(bcn_int * NSEC_PER_USEC));
> + hrtimer_forward_now(&data->beacon_timer,
> + ns_to_ktime(bcn_int * NSEC_PER_USEC));
> return HRTIMER_RESTART;
> }
>
next prev parent reply other threads:[~2021-09-15 9:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-28 4:52 [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode syzbot
2021-08-30 10:58 ` Dmitry Vyukov
[not found] ` <20210831074532.2255-1-hdanton@sina.com>
2021-09-13 10:28 ` Thomas Gleixner
[not found] ` <20210914123726.4219-1-hdanton@sina.com>
2021-09-14 14:58 ` Thomas Gleixner
2021-09-14 18:00 ` Dmitry Vyukov
2021-09-14 18:31 ` Paul E. McKenney
2021-09-15 9:36 ` Thomas Gleixner
2021-09-15 8:57 ` Thomas Gleixner
2021-09-15 9:14 ` Dmitry Vyukov [this message]
2021-09-15 9:32 ` Thomas Gleixner
2021-09-16 9:24 ` Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACT4Y+avKp8LCS8vBdaFLXFNcNiCq3vF-8K59o7c1oy86v-ADA@mail.gmail.com \
--to=dvyukov@google.com \
--cc=hdanton@sina.com \
--cc=johannes.berg@intel.com \
--cc=kasan-dev@googlegroups.com \
--cc=kvalo@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).