Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode

From: Thomas Gleixner <tglx@linutronix.de>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Hillf Danton <hdanton@sina.com>,
	syzbot <syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com>,
	linux-kernel@vger.kernel.org, paulmck@kernel.org,
	syzkaller-bugs@googlegroups.com,
	Peter Zijlstra <peterz@infradead.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	Johannes Berg <johannes.berg@intel.com>,
	Kalle Valo <kvalo@codeaurora.org>,
	linux-wireless@vger.kernel.org
Subject: Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
Date: Wed, 15 Sep 2021 10:57:52 +0200	[thread overview]
Message-ID: <87mtoeb4hb.ffs@tglx> (raw)
In-Reply-To: <CACT4Y+Yd3pEfZhRUQS9ymW+sQZ4O58Dz714xSqoZvdKa_9s2oQ@mail.gmail.com>

On Tue, Sep 14 2021 at 20:00, Dmitry Vyukov wrote:
> On Tue, 14 Sept 2021 at 16:58, Thomas Gleixner <tglx@linutronix.de> wrote:
>> Now what happens when the mac80211 callback rearms the timer so it
>> expires immediately again:
>>
>>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
>>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
>>
>> bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
>> which would certainly cause the loop in hrtimer_run_queues() to keeping
>> looping forever.
>>
>> That should be easy to verify by implementing a simple test which
>> reschedules a hrtimer from the callback with a expiry time close to now.
>>
>> Not today as I'm about to head home to fire up the pizza oven.
>
> This question definitely shouldn't take priority over the pizza. But I
> think I saw this "rearm a timer with a user-controlled value without
> any checks" pattern lots of times and hangs are inherently harder to
> localize and reproduce. So I wonder if it makes sense to add a debug
> config that would catch such cases right when the timer is set up
> (issue a WARNING)?

Yes and no. It's hard to differentiate between a valid short expiry
rearm and something which is caused by unchecked values. I have some
ideas but all of them are expensive and therefore probably debug
only. Which is actually better than nothing :)

> However, for automated testing there is the usual question of
> balancing between false positives and false negatives. The check
> should not produce false positives, but at the same time it should
> catch [almost] all actual stalls so that they don't manifest as
> duplicate stall reports.

Right. The problem could be even there with checked values:

       start_timer(1ms)
       timer_expires()
         callback()
           forward_timer(timer, now, period(1ms));

which might be perfectly fine with a production kernel as it leaves
enough time to make overall progress.

Now with a full debug kernel with all bells and whistels that callback
might just run into this situation:

      start_timer(1ms) T0
       timer_expires() T1
         callback()
           do_stuff()
           forward_timer(timer, TNOW, period(1ms));


T1 - T0   = 1.001ms
TNOW - T1 = 0.998 ms

So the forward will just rearm it to T0 + 2ms which means it expires in
1us.

> If I understand it correctly the timer is not actually set up as
> periodic, but rather each callback invocation arms it again. Setting
> up a timer for 1 ns _once_ (or few times) is probably fine (right?),
> so the check needs to be somewhat more elaborate and detect "infinite"
> rearming.

Yes.

That made me actually look at that mac80211_hwsim callback again.

	hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
			ns_to_ktime(bcn_int * NSEC_PER_USEC));

So what this does is really wrong because it tries to schedule the timer
on the theoretical periodic timeline. Which goes really south once the
timer is late or the callback execution took longer than the
period. Hypervisors scheduling out a VCPU at the wrong place will do
that for you nicely.

What this actually should use is hrtimer_forward_now() which prevents
that problem because it will forward the timer in the periodic schedule
beyond now. That won't prevent the above corner case, but I doubt you
can create an endless loop with that scenario as easy as you can with
trying to catch up on your theoretical timeline by using the previous
expiry time as a base for the forward. Patch below.

/me goes off to audit hrtimer_forward() usage. Sigh...

After that figure out ways to debug or even prevent this. More sigh...

Thanks,

        tglx
---
 drivers/net/wireless/mac80211_hwsim.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -1867,8 +1867,8 @@ mac80211_hwsim_beacon(struct hrtimer *ti
 		bcn_int -= data->bcn_delta;
 		data->bcn_delta = 0;
 	}
-	hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
-			ns_to_ktime(bcn_int * NSEC_PER_USEC));
+	hrtimer_forward_now(&data->beacon_timer,
+			    ns_to_ktime(bcn_int * NSEC_PER_USEC));
 	return HRTIMER_RESTART;
 }