All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juri Lelli <juri.lelli@redhat.com>
To: luca abeni <luca.abeni@santannapisa.it>
Cc: "chengjian (D)" <cj.chengjian@huawei.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Li Bin <huawei.libin@huawei.com>,
	"Xiexiuqi (Xie XiuQi)" <xiexiuqi@huawei.com>,
	mingo@redhat.com, Peter Zijlstra <peterz@infradead.org>
Subject: Re: WARN ON at kernel/sched/deadline.c task_non_contending
Date: Fri, 22 Mar 2019 15:32:32 +0100	[thread overview]
Message-ID: <20190322143232.GI8775@localhost.localdomain> (raw)
In-Reply-To: <20190313154948.773427d6@luca64>

Hi,

On 13/03/19 15:49, luca abeni wrote:
> Hi,
> 
> (I added Juri in cc)
> 
> On Tue, 12 Mar 2019 10:03:12 +0800
> "chengjian (D)" <cj.chengjian@huawei.com> wrote:
> [...]
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 31c050a0d0ce..d73cb033a06d 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -252,7 +252,6 @@ static void task_non_contending(struct
> > task_struct *p) if (dl_entity_is_special(dl_se))
> >                  return;
> > 
> > -       WARN_ON(hrtimer_active(&dl_se->inactive_timer));
> >          WARN_ON(dl_se->dl_non_contending);
> > 
> >          zerolag_time = dl_se->deadline -
> > @@ -287,7 +286,9 @@ static void task_non_contending(struct
> > task_struct *p) }
> > 
> >          dl_se->dl_non_contending = 1;
> > -       get_task_struct(p);
> > +
> > +       if (!hrtimer_active(&dl_se->inactive_timer));
> > +               get_task_struct(p);
> >          hrtimer_start(timer, ns_to_ktime(zerolag_time),
> > HRTIMER_MODE_REL); }
> 
> After looking at the patch a little bit more and running some tests,
> I suspect this solution might be racy:
> when the timer is already active, (and hrtimer_start() fails), it
> relies on its handler to decrease the running bw (by setting
> dl_non_contending to 1)... But inactive_task_timer() might have
> already checked dl_non_contending, finding it equal to 0 (so, it
> ends up doing nothing and the running bw is not decreased).
> 
> 
> So, I would prefer a different solution. I think this patch should work:
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 6a73e41a2016..43901fa3f269 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -252,7 +252,6 @@ static void task_non_contending(struct task_struct *p)
>  	if (dl_entity_is_special(dl_se))
>  		return;
>  
> -	WARN_ON(hrtimer_active(&dl_se->inactive_timer));
>  	WARN_ON(dl_se->dl_non_contending);
>  
>  	zerolag_time = dl_se->deadline -
> @@ -269,7 +268,7 @@ static void task_non_contending(struct task_struct *p)
>  	 * If the "0-lag time" already passed, decrease the active
>  	 * utilization now, instead of starting a timer
>  	 */
> -	if (zerolag_time < 0) {
> +	if ((zerolag_time < 0) || hrtimer_active(&dl_se->inactive_timer)) {
>  		if (dl_task(p))
>  			sub_running_bw(dl_se, dl_rq);
>  		if (!dl_task(p) || p->state == TASK_DEAD) {
> 
> 
> The idea is that if the timer is active, we leave dl_non_contending set to
> 0 (so that the timer handler does nothing), and we immediately decrease the
> running bw.
> I think this is OK, because this situation can happen only if the task
> blocks, wakes up while the timer handler is running, and then immediately
> blocks again - while the timer handler is still running. So, the "zero lag
> time" cannot be too much in the future.

And if we get here and the handler is running it means that the handler
is spinning on rq->lock waiting the dequeue to release it. So, this
looks safe to me as well.

BTW, I could reproduce with Steve's deadline_test [1], and this seems to
fix it.

Would you mind sending out a proper patch Luca?

Thanks!

- Juri

1 - https://goo.gl/fVbRSu

  parent reply	other threads:[~2019-03-22 14:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-12  2:03 WARN ON at kernel/sched/deadline.c task_non_contending chengjian (D)
2019-03-12  7:59 ` luca abeni
2019-03-13 14:49 ` luca abeni
2019-03-15  0:43   ` chengjian (D)
2019-03-15 11:06     ` luca abeni
2019-03-22 14:32   ` Juri Lelli [this message]
2019-03-22 14:38     ` luca abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190322143232.GI8775@localhost.localdomain \
    --to=juri.lelli@redhat.com \
    --cc=cj.chengjian@huawei.com \
    --cc=huawei.libin@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luca.abeni@santannapisa.it \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=xiexiuqi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.