linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: luca abeni <luca.abeni@santannapisa.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Juri Lelli <juri.lelli@gmail.com>,
	syzbot <syzbot+385468161961cee80c31@syzkaller.appspotmail.com>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	LKML <linux-kernel@vger.kernel.org>,
	mingo@redhat.com, nstange@suse.de,
	syzkaller-bugs@googlegroups.com, henrik@austad.us,
	Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>,
	Claudio Scordino <claudio@evidence.eu.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: INFO: rcu detected stall in do_idle
Date: Tue, 30 Oct 2018 12:08:04 +0100	[thread overview]
Message-ID: <20181030120804.2f30c2da@sweethome> (raw)
In-Reply-To: <20181030104554.GB8177@hirez.programming.kicks-ass.net>

Hi Peter,

On Tue, 30 Oct 2018 11:45:54 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
[...]
> >  2. This is related to perf_event_open syscall reproducer does
> > before becoming DEADLINE and entering the busy loop. Enabling of
> > perf swevents generates lot of hrtimers load that happens in the
> >     reproducer task context. Now, DEADLINE uses rq_clock() for
> > setting deadlines, but rq_clock_task() for doing runtime
> > enforcement. In a situation like this it seems that the amount of
> > irq pressure becomes pretty big (I'm seeing this on kvm, real hw
> > should maybe do better, pain point remains I guess), so rq_clock()
> > and rq_clock_task() might become more a more skewed w.r.t. each
> > other. Since rq_clock() is only used when setting absolute
> > deadlines for the first time (or when resetting them in certain
> > cases), after a bit the replenishment code will start to see
> > postponed deadlines always in the past w.r.t. rq_clock(). And this
> > brings us back to the fact that the task is never stopped, since it
> > can't keep up with rq_clock().
> > 
> >     - Not sure yet how we want to address this [1]. We could use
> >       rq_clock() everywhere, but tasks might be penalized by irq
> >       pressure (theoretically this would mandate that irqs are
> >       explicitly accounted for I guess). I tried to use the skew
> > between the two clocks to "fix" deadlines, but that puts us at
> > risks of de-synchronizing userspace and kernel views of deadlines.  
> 
> Hurm.. right. We knew of this issue back when we did it.
> I suppose now it hurts and we need to figure something out.
> 
> By virtue of being a real-time class, we do indeed need to have
> deadline on the wall-clock. But if we then don't account runtime on
> that same clock, but on a potentially slower clock, we get the
> problem that we can run longer than our period/deadline, which is
> what we're running into here I suppose.

I might be hugely misunderstanding something here, but in my impression
the issue is just that if the IRQ time is not accounted to the
-deadline task, then the non-deadline tasks might be starved.

I do not see this as a skew between two clocks, but as an accounting
thing:
- if we decide that the IRQ time is accounted to the -deadline
  task (this is what happens with CONFIG_IRQ_TIME_ACCOUNTING disabled),
  then the non-deadline tasks are not starved (but of course the
  -deadline tasks executes for less than its reserved time in the
  period); 
- if we decide that the IRQ time is not accounted to the -deadline task
  (this is what happens with CONFIG_IRQ_TIME_ACCOUNTING enabled), then
  the -deadline task executes for the expected amount of time (about
  60% of the CPU time), but an IRQ load of 40% will starve non-deadline
  tasks (this is what happens in the bug that triggered this discussion)

I think this might be seen as an adimission control issue: when
CONFIG_IRQ_TIME_ACCOUNTING is disabled, the IRQ time is accounted for
in the admission control (because it ends up in the task's runtime),
but when CONFIG_IRQ_TIME_ACCOUNTING is enabled the IRQ time is not
accounted for in the admission test (the IRQ handler becomes some sort
of entity with a higher priority than -deadline tasks, on which no
accounting or enforcement is performed).



> And yes, at some point RT workloads need to be aware of the jitter
> injected by things like IRQs and such. But I believe the rationale was
> that for soft real-time workloads this current semantic was 'easier'
> because we get to ignore IRQ overhead for workload estimation etc.
> 
> What we could maybe do is track runtime in both rq_clock_task() and
> rq_clock() and detect where the rq_clock based one exceeds the period
> and then push out the deadline (and add runtime).
> 
> Maybe something along such lines; does that make sense?

Uhm... I have to study and test your patch... I'll comment on this
later.



			Thanks,
				Luca


> 
> ---
>  include/linux/sched.h   |  3 +++
>  kernel/sched/deadline.c | 53
> ++++++++++++++++++++++++++++++++----------------- 2 files changed, 38
> insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 8f8a5418b627..6aec81cb3d2e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -522,6 +522,9 @@ struct sched_dl_entity {
>  	u64				deadline;	/*
> Absolute deadline for this instance	*/ unsigned
> int			flags;		/* Specifying the
> scheduler behaviour	*/ 
> +	u64				wallstamp;
> +	s64				walltime;
> +
>  	/*
>  	 * Some bool flags:
>  	 *
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 91e4202b0634..633c8f36c700 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -683,16 +683,7 @@ static void replenish_dl_entity(struct
> sched_dl_entity *dl_se, if (dl_se->dl_yielded && dl_se->runtime > 0)
>  		dl_se->runtime = 0;
>  
> -	/*
> -	 * We keep moving the deadline away until we get some
> -	 * available runtime for the entity. This ensures correct
> -	 * handling of situations where the runtime overrun is
> -	 * arbitrary large.
> -	 */
> -	while (dl_se->runtime <= 0) {
> -		dl_se->deadline += pi_se->dl_period;
> -		dl_se->runtime += pi_se->dl_runtime;
> -	}
> +	/* XXX what do we do with pi_se */
>  
>  	/*
>  	 * At this point, the deadline really should be "in
> @@ -1148,9 +1139,9 @@ static void update_curr_dl(struct rq *rq)
>  {
>  	struct task_struct *curr = rq->curr;
>  	struct sched_dl_entity *dl_se = &curr->dl;
> -	u64 delta_exec, scaled_delta_exec;
> +	u64 delta_exec, scaled_delta_exec, delta_wall;
>  	int cpu = cpu_of(rq);
> -	u64 now;
> +	u64 now, wall;
>  
>  	if (!dl_task(curr) || !on_dl_rq(dl_se))
>  		return;
> @@ -1171,6 +1162,17 @@ static void update_curr_dl(struct rq *rq)
>  		return;
>  	}
>  
> +	wall = rq_clock();
> +	delta_wall = wall - dl_se->wallstamp;
> +	if (delta_wall > 0) {
> +		dl_se->walltime += delta_wall;
> +		dl_se->wallstamp = wall;
> +	}
> +
> +	/* check if rq_clock_task() has been too slow */
> +	if (unlikely(dl_se->walltime > dl_se->period))
> +		goto throttle;
> +
>  	schedstat_set(curr->se.statistics.exec_max,
>  		      max(curr->se.statistics.exec_max, delta_exec));
>  
> @@ -1204,14 +1206,27 @@ static void update_curr_dl(struct rq *rq)
>  
>  	dl_se->runtime -= scaled_delta_exec;
>  
> -throttle:
>  	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> +throttle:
>  		dl_se->dl_throttled = 1;
>  
> -		/* If requested, inform the user about runtime
> overruns. */
> -		if (dl_runtime_exceeded(dl_se) &&
> -		    (dl_se->flags & SCHED_FLAG_DL_OVERRUN))
> -			dl_se->dl_overrun = 1;
> +		if (dl_runtime_exceeded(dl_se)) {
> +			/* If requested, inform the user about
> runtime overruns. */
> +			if (dl_se->flags & SCHED_FLAG_DL_OVERRUN)
> +				dl_se->dl_overrun = 1;
> +
> +		}
> +
> +		/*
> +		 * We keep moving the deadline away until we get
> some available
> +		 * runtime for the entity. This ensures correct
> handling of
> +		 * situations where the runtime overrun is arbitrary
> large.
> +		 */
> +		while (dl_se->runtime <= 0 || dl_se->walltime >
> dl_se->period) {
> +			dl_se->deadline += dl_se->dl_period;
> +			dl_se->runtime  += dl_se->dl_runtime;
> +			dl_se->walltime -= dl_se->dl_period;
> +		}
>  
>  		__dequeue_task_dl(rq, curr, 0);
>  		if (unlikely(dl_se->dl_boosted
> || !start_dl_timer(curr))) @@ -1751,9 +1766,10 @@
> pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct
> rq_flags *rf) p = dl_task_of(dl_se);
>  	p->se.exec_start = rq_clock_task(rq);
> +	dl_se->wallstamp = rq_clock(rq);
>  
>  	/* Running task will never be pushed. */
> -       dequeue_pushable_dl_task(rq, p);
> +	dequeue_pushable_dl_task(rq, p);
>  
>  	if (hrtick_enabled(rq))
>  		start_hrtick_dl(rq, p);
> @@ -1811,6 +1827,7 @@ static void set_curr_task_dl(struct rq *rq)
>  	struct task_struct *p = rq->curr;
>  
>  	p->se.exec_start = rq_clock_task(rq);
> +	p->dl_se.wallstamp = rq_clock(rq);
>  
>  	/* You can't push away the running task */
>  	dequeue_pushable_dl_task(rq, p);


  reply	other threads:[~2018-10-30 11:08 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-13  7:31 INFO: rcu detected stall in do_idle syzbot
2018-10-16 13:24 ` Thomas Gleixner
2018-10-16 14:03   ` Peter Zijlstra
2018-10-16 14:41     ` Juri Lelli
2018-10-16 14:45       ` Thomas Gleixner
2018-10-16 15:36         ` Juri Lelli
2018-10-18  8:28           ` Juri Lelli
2018-10-18  9:48             ` Peter Zijlstra
2018-10-18 10:10               ` Juri Lelli
2018-10-18 10:38                 ` luca abeni
2018-10-18 10:33               ` luca abeni
2018-10-19 13:14                 ` Peter Zijlstra
2018-10-18 10:23             ` luca abeni
2018-10-18 10:47               ` Juri Lelli
2018-10-18 11:08                 ` luca abeni
2018-10-18 12:21                   ` Juri Lelli
2018-10-18 12:36                     ` luca abeni
2018-10-19 11:39                   ` Peter Zijlstra
2018-10-19 20:50                     ` luca abeni
2018-10-24 12:03                       ` Juri Lelli
2018-10-27 11:16                         ` Dmitry Vyukov
2018-10-28  8:33                           ` Juri Lelli
2018-10-30 10:45                         ` Peter Zijlstra
2018-10-30 11:08                           ` luca abeni [this message]
2018-10-31 16:18                             ` Daniel Bristot de Oliveira
2018-10-31 16:40                               ` Juri Lelli
2018-10-31 17:39                                 ` Peter Zijlstra
2018-10-31 17:58                                 ` Daniel Bristot de Oliveira
2018-11-01  5:55                                   ` Juri Lelli
2018-11-02 10:00                                     ` Daniel Bristot de Oliveira
2018-11-05 10:55                                       ` Juri Lelli
2018-11-07 10:12                                         ` Daniel Bristot de Oliveira
2018-10-31 17:38                               ` Peter Zijlstra
2018-10-30 11:12                           ` Juri Lelli
2018-11-06 11:44                             ` Juri Lelli
2021-10-27  0:59 Hao Sun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181030120804.2f30c2da@sweethome \
    --to=luca.abeni@santannapisa.it \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=claudio@evidence.eu.com \
    --cc=henrik@austad.us \
    --cc=hpa@zytor.com \
    --cc=juri.lelli@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nstange@suse.de \
    --cc=peterz@infradead.org \
    --cc=syzbot+385468161961cee80c31@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=tommaso.cucinotta@santannapisa.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).