linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Hillf Danton <hdanton@sina.com>,
	syzbot <syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com>,
	linux-kernel@vger.kernel.org, paulmck@kernel.org,
	syzkaller-bugs@googlegroups.com,
	Peter Zijlstra <peterz@infradead.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	Johannes Berg <johannes.berg@intel.com>,
	Kalle Valo <kvalo@codeaurora.org>,
	linux-wireless@vger.kernel.org
Subject: Re: [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode
Date: Wed, 15 Sep 2021 10:57:52 +0200	[thread overview]
Message-ID: <87mtoeb4hb.ffs@tglx> (raw)
In-Reply-To: <CACT4Y+Yd3pEfZhRUQS9ymW+sQZ4O58Dz714xSqoZvdKa_9s2oQ@mail.gmail.com>

On Tue, Sep 14 2021 at 20:00, Dmitry Vyukov wrote:
> On Tue, 14 Sept 2021 at 16:58, Thomas Gleixner <tglx@linutronix.de> wrote:
>> Now what happens when the mac80211 callback rearms the timer so it
>> expires immediately again:
>>
>>         hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
>>                         ns_to_ktime(bcn_int * NSEC_PER_USEC));
>>
>> bcn is a user space controlled value. Now lets assume that bcn_int is <=1,
>> which would certainly cause the loop in hrtimer_run_queues() to keeping
>> looping forever.
>>
>> That should be easy to verify by implementing a simple test which
>> reschedules a hrtimer from the callback with a expiry time close to now.
>>
>> Not today as I'm about to head home to fire up the pizza oven.
>
> This question definitely shouldn't take priority over the pizza. But I
> think I saw this "rearm a timer with a user-controlled value without
> any checks" pattern lots of times and hangs are inherently harder to
> localize and reproduce. So I wonder if it makes sense to add a debug
> config that would catch such cases right when the timer is set up
> (issue a WARNING)?

Yes and no. It's hard to differentiate between a valid short expiry
rearm and something which is caused by unchecked values. I have some
ideas but all of them are expensive and therefore probably debug
only. Which is actually better than nothing :)

> However, for automated testing there is the usual question of
> balancing between false positives and false negatives. The check
> should not produce false positives, but at the same time it should
> catch [almost] all actual stalls so that they don't manifest as
> duplicate stall reports.

Right. The problem could be even there with checked values:

       start_timer(1ms)
       timer_expires()
         callback()
           forward_timer(timer, now, period(1ms));

which might be perfectly fine with a production kernel as it leaves
enough time to make overall progress.

Now with a full debug kernel with all bells and whistels that callback
might just run into this situation:

      start_timer(1ms) T0
       timer_expires() T1
         callback()
           do_stuff()
           forward_timer(timer, TNOW, period(1ms));


T1 - T0   = 1.001ms
TNOW - T1 = 0.998 ms

So the forward will just rearm it to T0 + 2ms which means it expires in
1us.

> If I understand it correctly the timer is not actually set up as
> periodic, but rather each callback invocation arms it again. Setting
> up a timer for 1 ns _once_ (or few times) is probably fine (right?),
> so the check needs to be somewhat more elaborate and detect "infinite"
> rearming.

Yes.

That made me actually look at that mac80211_hwsim callback again.

	hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
			ns_to_ktime(bcn_int * NSEC_PER_USEC));

So what this does is really wrong because it tries to schedule the timer
on the theoretical periodic timeline. Which goes really south once the
timer is late or the callback execution took longer than the
period. Hypervisors scheduling out a VCPU at the wrong place will do
that for you nicely.

What this actually should use is hrtimer_forward_now() which prevents
that problem because it will forward the timer in the periodic schedule
beyond now. That won't prevent the above corner case, but I doubt you
can create an endless loop with that scenario as easy as you can with
trying to catch up on your theoretical timeline by using the previous
expiry time as a base for the forward. Patch below.

/me goes off to audit hrtimer_forward() usage. Sigh...

After that figure out ways to debug or even prevent this. More sigh...

Thanks,

        tglx
---
 drivers/net/wireless/mac80211_hwsim.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -1867,8 +1867,8 @@ mac80211_hwsim_beacon(struct hrtimer *ti
 		bcn_int -= data->bcn_delta;
 		data->bcn_delta = 0;
 	}
-	hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
-			ns_to_ktime(bcn_int * NSEC_PER_USEC));
+	hrtimer_forward_now(&data->beacon_timer,
+			    ns_to_ktime(bcn_int * NSEC_PER_USEC));
 	return HRTIMER_RESTART;
 }
 

  parent reply	other threads:[~2021-09-15  8:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-28  4:52 [syzbot] INFO: rcu detected stall in syscall_exit_to_user_mode syzbot
2021-08-30 10:58 ` Dmitry Vyukov
     [not found] ` <20210831074532.2255-1-hdanton@sina.com>
2021-09-13 10:28   ` Thomas Gleixner
     [not found]   ` <20210914123726.4219-1-hdanton@sina.com>
2021-09-14 14:58     ` Thomas Gleixner
2021-09-14 18:00       ` Dmitry Vyukov
2021-09-14 18:31         ` Paul E. McKenney
2021-09-15  9:36           ` Thomas Gleixner
2021-09-15  8:57         ` Thomas Gleixner [this message]
2021-09-15  9:14           ` Dmitry Vyukov
2021-09-15  9:32             ` Thomas Gleixner
2021-09-16  9:24               ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mtoeb4hb.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=dvyukov@google.com \
    --cc=hdanton@sina.com \
    --cc=johannes.berg@intel.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=syzbot+0e964fad69a9c462bc1e@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).