Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode

From: Dmitry Vyukov <dvyukov@google.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Hillf Danton <hdanton@sina.com>,
	syzbot <syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com>,
	linux-kernel@vger.kernel.org, paulmck@kernel.org,
	syzkaller-bugs@googlegroups.com,
	Peter Zijlstra <peterz@infradead.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	Johannes Berg <johannes.berg@intel.com>,
	Kalle Valo <kvalo@codeaurora.org>,
	linux-wireless@vger.kernel.org
Subject: Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
Date: Wed, 15 Sep 2021 11:14:06 +0200	[thread overview]
Message-ID: <CACT4Y+avKp8LCS8vBdaFLXFNcNiCq3vF-8K59o7c1oy86v-ADA@mail.gmail.com> (raw)
In-Reply-To: <87mtoeb4hb.ffs@tglx>

On Wed, 15 Sept 2021 at 10:57, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Sep 14 2021 at 20:00, Dmitry Vyukov wrote:
> > On Tue, 14 Sept 2021 at 16:58, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> Now what happens when the mac80211 callback rearms the timer so it
> >> expires immediately again:
> >>
> >>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
> >>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
> >>
> >> bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
> >> which would certainly cause the loop in hrtimer_run_queues() to keeping
> >> looping forever.
> >>
> >> That should be easy to verify by implementing a simple test which
> >> reschedules a hrtimer from the callback with a expiry time close to now.
> >>
> >> Not today as I'm about to head home to fire up the pizza oven.
> >
> > This question definitely shouldn't take priority over the pizza. But I
> > think I saw this "rearm a timer with a user-controlled value without
> > any checks" pattern lots of times and hangs are inherently harder to
> > localize and reproduce. So I wonder if it makes sense to add a debug
> > config that would catch such cases right when the timer is set up
> > (issue a WARNING)?
>
> Yes and no. It's hard to differentiate between a valid short expiry
> rearm and something which is caused by unchecked values. I have some
> ideas but all of them are expensive and therefore probably debug
> only. Which is actually better than nothing :)
>
> > However, for automated testing there is the usual question of
> > balancing between false positives and false negatives. The check
> > should not produce false positives, but at the same time it should
> > catch [almost] all actual stalls so that they don't manifest as
> > duplicate stall reports.
>
> Right. The problem could be even there with checked values:
>
>        start_timer(1ms)
>        timer_expires()
>          callback()
>            forward_timer(timer, now, period(1ms));
>
> which might be perfectly fine with a production kernel as it leaves
> enough time to make overall progress.
>
> Now with a full debug kernel with all bells and whistels that callback
> might just run into this situation:
>
>       start_timer(1ms) T0
>        timer_expires() T1
>          callback()
>            do_stuff()
>            forward_timer(timer, TNOW, period(1ms));
>
>
> T1 - T0   = 1.001ms
> TNOW - T1 = 0.998 ms
>
> So the forward will just rearm it to T0 + 2ms which means it expires in
> 1us.
>
> > If I understand it correctly the timer is not actually set up as
> > periodic, but rather each callback invocation arms it again. Setting
> > up a timer for 1 ns _once_ (or few times) is probably fine (right?),
> > so the check needs to be somewhat more elaborate and detect "infinite"
> > rearming.
>
> Yes.
>
> That made me actually look at that mac80211_hwsim callback again.
>
>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
>
> So what this does is really wrong because it tries to schedule the timer
> on the theoretical periodic timeline. Which goes really south once the
> timer is late or the callback execution took longer than the
> period. Hypervisors scheduling out a VCPU at the wrong place will do
> that for you nicely.

Nice!

You mentioned that hrtimer_run_queues() may not return. Does it mean
that it can just loop executing the same re-armed callback again and
again? Maybe then the debug check condition should be that
hrtimer_run_queues() runs the same callback more than N times w/o
returning?

> What this actually should use is hrtimer_forward_now() which prevents
> that problem because it will forward the timer in the periodic schedule
> beyond now. That won't prevent the above corner case, but I doubt you
> can create an endless loop with that scenario as easy as you can with
> trying to catch up on your theoretical timeline by using the previous
> expiry time as a base for the forward. Patch below.
>
> /me goes off to audit hrtimer_forward() usage. Sigh...
>
> After that figure out ways to debug or even prevent this. More sigh...
>
> Thanks,
>
>         tglx
> ---
>  drivers/net/wireless/mac80211_hwsim.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/drivers/net/wireless/mac80211_hwsim.c
> +++ b/drivers/net/wireless/mac80211_hwsim.c
> @@ -1867,8 +1867,8 @@ mac80211_hwsim_beacon(struct hrtimer *ti
>                 bcn_int -= data->bcn_delta;
>                 data->bcn_delta = 0;
>         }
> -       hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
> -                       ns_to_ktime(bcn_int * NSEC_PER_USEC));
> +       hrtimer_forward_now(&data->beacon_timer,
> +                           ns_to_ktime(bcn_int * NSEC_PER_USEC));
>         return HRTIMER_RESTART;
>  }
>