From: luca abeni <luca.abeni@santannapisa.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Juri Lelli <juri.lelli@gmail.com>,
syzbot <syzbot+385468161961cee80c31@syzkaller.appspotmail.com>,
Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
LKML <linux-kernel@vger.kernel.org>,
mingo@redhat.com, nstange@suse.de,
syzkaller-bugs@googlegroups.com, henrik@austad.us,
Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>,
Claudio Scordino <claudio@evidence.eu.com>,
Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: INFO: rcu detected stall in do_idle
Date: Tue, 30 Oct 2018 12:08:04 +0100 [thread overview]
Message-ID: <20181030120804.2f30c2da@sweethome> (raw)
In-Reply-To: <20181030104554.GB8177@hirez.programming.kicks-ass.net>
Hi Peter,
On Tue, 30 Oct 2018 11:45:54 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
[...]
> > 2. This is related to perf_event_open syscall reproducer does
> > before becoming DEADLINE and entering the busy loop. Enabling of
> > perf swevents generates lot of hrtimers load that happens in the
> > reproducer task context. Now, DEADLINE uses rq_clock() for
> > setting deadlines, but rq_clock_task() for doing runtime
> > enforcement. In a situation like this it seems that the amount of
> > irq pressure becomes pretty big (I'm seeing this on kvm, real hw
> > should maybe do better, pain point remains I guess), so rq_clock()
> > and rq_clock_task() might become more a more skewed w.r.t. each
> > other. Since rq_clock() is only used when setting absolute
> > deadlines for the first time (or when resetting them in certain
> > cases), after a bit the replenishment code will start to see
> > postponed deadlines always in the past w.r.t. rq_clock(). And this
> > brings us back to the fact that the task is never stopped, since it
> > can't keep up with rq_clock().
> >
> > - Not sure yet how we want to address this [1]. We could use
> > rq_clock() everywhere, but tasks might be penalized by irq
> > pressure (theoretically this would mandate that irqs are
> > explicitly accounted for I guess). I tried to use the skew
> > between the two clocks to "fix" deadlines, but that puts us at
> > risks of de-synchronizing userspace and kernel views of deadlines.
>
> Hurm.. right. We knew of this issue back when we did it.
> I suppose now it hurts and we need to figure something out.
>
> By virtue of being a real-time class, we do indeed need to have
> deadline on the wall-clock. But if we then don't account runtime on
> that same clock, but on a potentially slower clock, we get the
> problem that we can run longer than our period/deadline, which is
> what we're running into here I suppose.
I might be hugely misunderstanding something here, but in my impression
the issue is just that if the IRQ time is not accounted to the
-deadline task, then the non-deadline tasks might be starved.
I do not see this as a skew between two clocks, but as an accounting
thing:
- if we decide that the IRQ time is accounted to the -deadline
task (this is what happens with CONFIG_IRQ_TIME_ACCOUNTING disabled),
then the non-deadline tasks are not starved (but of course the
-deadline tasks executes for less than its reserved time in the
period);
- if we decide that the IRQ time is not accounted to the -deadline task
(this is what happens with CONFIG_IRQ_TIME_ACCOUNTING enabled), then
the -deadline task executes for the expected amount of time (about
60% of the CPU time), but an IRQ load of 40% will starve non-deadline
tasks (this is what happens in the bug that triggered this discussion)
I think this might be seen as an adimission control issue: when
CONFIG_IRQ_TIME_ACCOUNTING is disabled, the IRQ time is accounted for
in the admission control (because it ends up in the task's runtime),
but when CONFIG_IRQ_TIME_ACCOUNTING is enabled the IRQ time is not
accounted for in the admission test (the IRQ handler becomes some sort
of entity with a higher priority than -deadline tasks, on which no
accounting or enforcement is performed).
> And yes, at some point RT workloads need to be aware of the jitter
> injected by things like IRQs and such. But I believe the rationale was
> that for soft real-time workloads this current semantic was 'easier'
> because we get to ignore IRQ overhead for workload estimation etc.
>
> What we could maybe do is track runtime in both rq_clock_task() and
> rq_clock() and detect where the rq_clock based one exceeds the period
> and then push out the deadline (and add runtime).
>
> Maybe something along such lines; does that make sense?
Uhm... I have to study and test your patch... I'll comment on this
later.
Thanks,
Luca
>
> ---
> include/linux/sched.h | 3 +++
> kernel/sched/deadline.c | 53
> ++++++++++++++++++++++++++++++++----------------- 2 files changed, 38
> insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 8f8a5418b627..6aec81cb3d2e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -522,6 +522,9 @@ struct sched_dl_entity {
> u64 deadline; /*
> Absolute deadline for this instance */ unsigned
> int flags; /* Specifying the
> scheduler behaviour */
> + u64 wallstamp;
> + s64 walltime;
> +
> /*
> * Some bool flags:
> *
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 91e4202b0634..633c8f36c700 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -683,16 +683,7 @@ static void replenish_dl_entity(struct
> sched_dl_entity *dl_se, if (dl_se->dl_yielded && dl_se->runtime > 0)
> dl_se->runtime = 0;
>
> - /*
> - * We keep moving the deadline away until we get some
> - * available runtime for the entity. This ensures correct
> - * handling of situations where the runtime overrun is
> - * arbitrary large.
> - */
> - while (dl_se->runtime <= 0) {
> - dl_se->deadline += pi_se->dl_period;
> - dl_se->runtime += pi_se->dl_runtime;
> - }
> + /* XXX what do we do with pi_se */
>
> /*
> * At this point, the deadline really should be "in
> @@ -1148,9 +1139,9 @@ static void update_curr_dl(struct rq *rq)
> {
> struct task_struct *curr = rq->curr;
> struct sched_dl_entity *dl_se = &curr->dl;
> - u64 delta_exec, scaled_delta_exec;
> + u64 delta_exec, scaled_delta_exec, delta_wall;
> int cpu = cpu_of(rq);
> - u64 now;
> + u64 now, wall;
>
> if (!dl_task(curr) || !on_dl_rq(dl_se))
> return;
> @@ -1171,6 +1162,17 @@ static void update_curr_dl(struct rq *rq)
> return;
> }
>
> + wall = rq_clock();
> + delta_wall = wall - dl_se->wallstamp;
> + if (delta_wall > 0) {
> + dl_se->walltime += delta_wall;
> + dl_se->wallstamp = wall;
> + }
> +
> + /* check if rq_clock_task() has been too slow */
> + if (unlikely(dl_se->walltime > dl_se->period))
> + goto throttle;
> +
> schedstat_set(curr->se.statistics.exec_max,
> max(curr->se.statistics.exec_max, delta_exec));
>
> @@ -1204,14 +1206,27 @@ static void update_curr_dl(struct rq *rq)
>
> dl_se->runtime -= scaled_delta_exec;
>
> -throttle:
> if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> +throttle:
> dl_se->dl_throttled = 1;
>
> - /* If requested, inform the user about runtime
> overruns. */
> - if (dl_runtime_exceeded(dl_se) &&
> - (dl_se->flags & SCHED_FLAG_DL_OVERRUN))
> - dl_se->dl_overrun = 1;
> + if (dl_runtime_exceeded(dl_se)) {
> + /* If requested, inform the user about
> runtime overruns. */
> + if (dl_se->flags & SCHED_FLAG_DL_OVERRUN)
> + dl_se->dl_overrun = 1;
> +
> + }
> +
> + /*
> + * We keep moving the deadline away until we get
> some available
> + * runtime for the entity. This ensures correct
> handling of
> + * situations where the runtime overrun is arbitrary
> large.
> + */
> + while (dl_se->runtime <= 0 || dl_se->walltime >
> dl_se->period) {
> + dl_se->deadline += dl_se->dl_period;
> + dl_se->runtime += dl_se->dl_runtime;
> + dl_se->walltime -= dl_se->dl_period;
> + }
>
> __dequeue_task_dl(rq, curr, 0);
> if (unlikely(dl_se->dl_boosted
> || !start_dl_timer(curr))) @@ -1751,9 +1766,10 @@
> pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct
> rq_flags *rf) p = dl_task_of(dl_se);
> p->se.exec_start = rq_clock_task(rq);
> + dl_se->wallstamp = rq_clock(rq);
>
> /* Running task will never be pushed. */
> - dequeue_pushable_dl_task(rq, p);
> + dequeue_pushable_dl_task(rq, p);
>
> if (hrtick_enabled(rq))
> start_hrtick_dl(rq, p);
> @@ -1811,6 +1827,7 @@ static void set_curr_task_dl(struct rq *rq)
> struct task_struct *p = rq->curr;
>
> p->se.exec_start = rq_clock_task(rq);
> + p->dl_se.wallstamp = rq_clock(rq);
>
> /* You can't push away the running task */
> dequeue_pushable_dl_task(rq, p);
next prev parent reply other threads:[~2018-10-30 11:08 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-13 7:31 INFO: rcu detected stall in do_idle syzbot
2018-10-16 13:24 ` Thomas Gleixner
2018-10-16 14:03 ` Peter Zijlstra
2018-10-16 14:41 ` Juri Lelli
2018-10-16 14:45 ` Thomas Gleixner
2018-10-16 15:36 ` Juri Lelli
2018-10-18 8:28 ` Juri Lelli
2018-10-18 9:48 ` Peter Zijlstra
2018-10-18 10:10 ` Juri Lelli
2018-10-18 10:38 ` luca abeni
2018-10-18 10:33 ` luca abeni
2018-10-19 13:14 ` Peter Zijlstra
2018-10-18 10:23 ` luca abeni
2018-10-18 10:47 ` Juri Lelli
2018-10-18 11:08 ` luca abeni
2018-10-18 12:21 ` Juri Lelli
2018-10-18 12:36 ` luca abeni
2018-10-19 11:39 ` Peter Zijlstra
2018-10-19 20:50 ` luca abeni
2018-10-24 12:03 ` Juri Lelli
2018-10-27 11:16 ` Dmitry Vyukov
2018-10-28 8:33 ` Juri Lelli
2018-10-30 10:45 ` Peter Zijlstra
2018-10-30 11:08 ` luca abeni [this message]
2018-10-31 16:18 ` Daniel Bristot de Oliveira
2018-10-31 16:40 ` Juri Lelli
2018-10-31 17:39 ` Peter Zijlstra
2018-10-31 17:58 ` Daniel Bristot de Oliveira
2018-11-01 5:55 ` Juri Lelli
2018-11-02 10:00 ` Daniel Bristot de Oliveira
2018-11-05 10:55 ` Juri Lelli
2018-11-07 10:12 ` Daniel Bristot de Oliveira
2018-10-31 17:38 ` Peter Zijlstra
2018-10-30 11:12 ` Juri Lelli
2018-11-06 11:44 ` Juri Lelli
2021-10-27 0:59 Hao Sun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181030120804.2f30c2da@sweethome \
--to=luca.abeni@santannapisa.it \
--cc=bp@alien8.de \
--cc=bristot@redhat.com \
--cc=claudio@evidence.eu.com \
--cc=henrik@austad.us \
--cc=hpa@zytor.com \
--cc=juri.lelli@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nstange@suse.de \
--cc=peterz@infradead.org \
--cc=syzbot+385468161961cee80c31@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=tglx@linutronix.de \
--cc=tommaso.cucinotta@santannapisa.it \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).