linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Juri Lelli <juri.lelli@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: luca abeni <luca.abeni@santannapisa.it>,
	Thomas Gleixner <tglx@linutronix.de>,
	Juri Lelli <juri.lelli@gmail.com>,
	syzbot <syzbot+385468161961cee80c31@syzkaller.appspotmail.com>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	LKML <linux-kernel@vger.kernel.org>,
	mingo@redhat.com, nstange@suse.de,
	syzkaller-bugs@googlegroups.com, henrik@austad.us,
	Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>,
	Claudio Scordino <claudio@evidence.eu.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: INFO: rcu detected stall in do_idle
Date: Wed, 24 Oct 2018 14:03:35 +0200	[thread overview]
Message-ID: <20181024120335.GE29272@localhost.localdomain> (raw)
In-Reply-To: <20181019225005.61707c64@nowhere>

On 19/10/18 22:50, luca abeni wrote:
> On Fri, 19 Oct 2018 13:39:42 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, Oct 18, 2018 at 01:08:11PM +0200, luca abeni wrote:
> > > Ok, I see the issue now: the problem is that the "while
> > > (dl_se->runtime <= 0)" loop is executed at replenishment time, but
> > > the deadline should be postponed at enforcement time.
> > > 
> > > I mean: in update_curr_dl() we do:
> > > 	dl_se->runtime -= scaled_delta_exec;
> > > 	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> > > 		...
> > > 		enqueue replenishment timer at dl_next_period(dl_se)
> > > But dl_next_period() is based on a "wrong" deadline!
> > > 
> > > 
> > > I think that inserting a
> > >         while (dl_se->runtime <= -pi_se->dl_runtime) {
> > >                 dl_se->deadline += pi_se->dl_period;
> > >                 dl_se->runtime += pi_se->dl_runtime;
> > >         }
> > > immediately after "dl_se->runtime -= scaled_delta_exec;" would fix
> > > the problem, no?  
> > 
> > That certainly makes sense to me.
> 
> Good; I'll try to work on this idea in the weekend.

So, we (me and Luca) managed to spend some more time on this and found a
few more things worth sharing. I'll try to summarize what we have got so
far (including what already discussed) because impression is that each
point might deserve a fix or at least consideration (just amazing how a
simple random fuzzer thing can highlight all that :). Apologies for the
long email.

Reproducer runs on a CONFIG_HZ=100, CONFIG_IRQ_TIME_ACCOUNTING kernel
and does something like this (only the bits that seems to matter here)

int main(void)
{
  [...]
  [setup stuff at 0x2001d000]
  syscall(__NR_perf_event_open, 0x2001d000, 0, -1, -1, 0);
  *(uint32_t*)0x20000000 = 0;
  *(uint32_t*)0x20000004 = 6;
  *(uint64_t*)0x20000008 = 0;
  *(uint32_t*)0x20000010 = 0;
  *(uint32_t*)0x20000014 = 0;
  *(uint64_t*)0x20000018 = 0x9917; <-- ~40us
  *(uint64_t*)0x20000020 = 0xffff; <-- ~65us (~60% bandwidth)
  *(uint64_t*)0x20000028 = 0;
  syscall(__NR_sched_setattr, 0, 0x20000000, 0);
  [busy loop]
  return 0;
}

And this causes problems because the task is actually never throttled.

Pain points:

 1. Granularity of enforcement (at each tick) is huge compared with
    the task runtime. This makes starting the replenishment timer,
    when runtime is depleted, always to fail (because old deadline
    is way in the past). So, the task is fully replenished and put
    back to run.

    - Luca's proposal should help here, since the deadline is postponed
      at throttling time, and replenishment timer set to that (and it
      should be in the future)

 1.1 Even if we fix 1. in a configuration like this, the task would
     still be able to run for ~10ms (worst case) and potentially starve
     other tasks. It doesn't seem a too big interval maybe, but there
     might be other very short activities that might miss an occasion
     to run "quickly".

     - Might be fixed by imposing (via sysctl) reasonable defaults for
       minimum runtime (w.r.t. HZ, like HZ/2) and maximum for period
       (as also a very small bandwidth task can have a big runtime if
       period is big as well)

 (1.2) When runtime becomes very negative (because delta_exec was big)
       we seem to spend lot of time inside the replenishment loop.

       - Not sure it's such a big problem, might need more profiling.
         Feeling is that once the other points will be addressed this
	 won't matter anymore

 2. This is related to perf_event_open syscall reproducer does before
    becoming DEADLINE and entering the busy loop. Enabling of perf
    swevents generates lot of hrtimers load that happens in the
    reproducer task context. Now, DEADLINE uses rq_clock() for setting
    deadlines, but rq_clock_task() for doing runtime enforcement.
    In a situation like this it seems that the amount of irq pressure
    becomes pretty big (I'm seeing this on kvm, real hw should maybe do
    better, pain point remains I guess), so rq_clock() and
    rq_clock_task() might become more a more skewed w.r.t. each other.
    Since rq_clock() is only used when setting absolute deadlines for
    the first time (or when resetting them in certain cases), after a
    bit the replenishment code will start to see postponed deadlines
    always in the past w.r.t. rq_clock(). And this brings us back to the
    fact that the task is never stopped, since it can't keep up with
    rq_clock().

    - Not sure yet how we want to address this [1]. We could use
      rq_clock() everywhere, but tasks might be penalized by irq
      pressure (theoretically this would mandate that irqs are
      explicitly accounted for I guess). I tried to use the skew between
      the two clocks to "fix" deadlines, but that puts us at risks of
      de-synchronizing userspace and kernel views of deadlines.

 3. HRTICK is not started for new entities.
 
    - Already got a patch for it.

This should be it, I hope. Luca (thanks a lot for your help) and please
add or correct me if I was wrong.

Thoughts?

Best,

- Juri

1 - https://elixir.bootlin.com/linux/latest/source/kernel/sched/deadline.c#L1162

  reply	other threads:[~2018-10-24 12:03 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-13  7:31 INFO: rcu detected stall in do_idle syzbot
2018-10-16 13:24 ` Thomas Gleixner
2018-10-16 14:03   ` Peter Zijlstra
2018-10-16 14:41     ` Juri Lelli
2018-10-16 14:45       ` Thomas Gleixner
2018-10-16 15:36         ` Juri Lelli
2018-10-18  8:28           ` Juri Lelli
2018-10-18  9:48             ` Peter Zijlstra
2018-10-18 10:10               ` Juri Lelli
2018-10-18 10:38                 ` luca abeni
2018-10-18 10:33               ` luca abeni
2018-10-19 13:14                 ` Peter Zijlstra
2018-10-18 10:23             ` luca abeni
2018-10-18 10:47               ` Juri Lelli
2018-10-18 11:08                 ` luca abeni
2018-10-18 12:21                   ` Juri Lelli
2018-10-18 12:36                     ` luca abeni
2018-10-19 11:39                   ` Peter Zijlstra
2018-10-19 20:50                     ` luca abeni
2018-10-24 12:03                       ` Juri Lelli [this message]
2018-10-27 11:16                         ` Dmitry Vyukov
2018-10-28  8:33                           ` Juri Lelli
2018-10-30 10:45                         ` Peter Zijlstra
2018-10-30 11:08                           ` luca abeni
2018-10-31 16:18                             ` Daniel Bristot de Oliveira
2018-10-31 16:40                               ` Juri Lelli
2018-10-31 17:39                                 ` Peter Zijlstra
2018-10-31 17:58                                 ` Daniel Bristot de Oliveira
2018-11-01  5:55                                   ` Juri Lelli
2018-11-02 10:00                                     ` Daniel Bristot de Oliveira
2018-11-05 10:55                                       ` Juri Lelli
2018-11-07 10:12                                         ` Daniel Bristot de Oliveira
2018-10-31 17:38                               ` Peter Zijlstra
2018-10-30 11:12                           ` Juri Lelli
2018-11-06 11:44                             ` Juri Lelli
2021-10-27  0:59 Hao Sun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181024120335.GE29272@localhost.localdomain \
    --to=juri.lelli@redhat.com \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=claudio@evidence.eu.com \
    --cc=henrik@austad.us \
    --cc=hpa@zytor.com \
    --cc=juri.lelli@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luca.abeni@santannapisa.it \
    --cc=mingo@redhat.com \
    --cc=nstange@suse.de \
    --cc=peterz@infradead.org \
    --cc=syzbot+385468161961cee80c31@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=tommaso.cucinotta@santannapisa.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).