All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/deadline: Fix bad accounting of nr_running
@ 2014-02-15  4:59 Steven Rostedt
  2014-02-15  9:52 ` Peter Zijlstra
  2014-02-17 15:47 ` [PATCH] " Juri Lelli
  0 siblings, 2 replies; 10+ messages in thread
From: Steven Rostedt @ 2014-02-15  4:59 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Juri Lelli, Linus Torvalds, Ingo Molnar,
	Thomas Gleixner, Andrew Morton

My test suite was locking up hard when enabling mmiotracer. This was due
to the mmiotracer placing all but one CPU offline. I found this out
when I was able to reproduce the bug with just my stress-cpu-hotplug
test. This bug baffled me because it would not always trigger, and
would only trigger on the first run after boot up. The
stress-cpu-hotplug test would crash hard the first run, or never crash
at all. But a new reboot may cause it to crash on the first run again.

I spent all week bisecting this, as I couldn't find a consistent
reproducer. I finally narrowed it down to the sched deadline patches,
and even more peculiar, to the commit that added the sched
deadline boot up self test to the latency tracer. Then it dawned on me
to what the bug was.

All it took was to run a task under sched deadline to screw up the CPU
hot plugging. This explained why it would lock up only on the first run
of the stress-cpu-hotplug test. The bug happened when the boot up self
test of the schedule latency tracer would test a deadline task. The
deadline task would corrupt something that would cause CPU hotplug to
fail. If it didn't corrupt it, the stress test would always work
(there's no other sched deadline tasks that would run to cause
problems). If it did corrupt on boot up, the first test would lockup
hard.

I proved this theory by running my deadline test program on another box,
and then run the stress-cpu-hotplug test, and it would now consistently
lock up. I could run stress-cpu-hotplug over and over with no problem,
but once I ran the deadline test, the next run of the
stress-cpu-hotplug would lock hard.

After adding lots of tracing to the code, I found the cause. The
function tracer showed that migrate_tasks() was stuck in an infinite
loop, where rq->nr_running never equaled 1 to break out of it. When I
added a trace_printk() to see what that number was, it was 335 and
never decrementing!

Looking at the deadline code I found:

static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int
flags) {
	dequeue_dl_entity(&p->dl);
	dequeue_pushable_dl_task(rq, p);
}

static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
flags) {
	update_curr_dl(rq);
	__dequeue_task_dl(rq, p, flags);

	dec_nr_running(rq);
}

And this:

	if (dl_runtime_exceeded(rq, dl_se)) {
		__dequeue_task_dl(rq, curr, 0);
		if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
			dl_se->dl_throttled = 1;
		else
			enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);

		if (!is_leftmost(curr, &rq->dl))
			resched_task(curr);
	}

Notice how we call __dequeue_task_dl() and in the else case we
call enqueue_task_dl()? Also notice that dequeue_task_dl() has
underscores where enqueue_task_dl() does not. The enqueue_task_dl()
calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
where we get nr_running out of sync.

By moving the dec_nr_running() from dequeue_task_dl() to
__dequeue_task_dl(), everything works again. That is, I can run the
deadline test program and then run the stress-cpu-hotplug() and all
would be fine.

For reference on my test programs:

  http://rostedt.homelinux.com/private/stress-cpu-hotplug
  http://rostedt.homelinux.com/private/deadline.c

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0dd5e09..84c2454 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -844,14 +844,14 @@ static void __dequeue_task_dl(struct rq *rq,
struct task_struct *p, int flags) {
 	dequeue_dl_entity(&p->dl);
 	dequeue_pushable_dl_task(rq, p);
+
+	dec_nr_running(rq);
 }
 
 static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
flags) {
 	update_curr_dl(rq);
 	__dequeue_task_dl(rq, p, flags);
-
-	dec_nr_running(rq);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/deadline: Fix bad accounting of nr_running
  2014-02-15  4:59 [PATCH] sched/deadline: Fix bad accounting of nr_running Steven Rostedt
@ 2014-02-15  9:52 ` Peter Zijlstra
  2014-02-15 13:03   ` Steven Rostedt
  2014-02-15 13:08   ` [PATCH v2] " Steven Rostedt
  2014-02-17 15:47 ` [PATCH] " Juri Lelli
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2014-02-15  9:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Juri Lelli, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	Andrew Morton

On Fri, Feb 14, 2014 at 11:59:46PM -0500, Steven Rostedt wrote:
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0dd5e09..84c2454 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -844,14 +844,14 @@ static void __dequeue_task_dl(struct rq *rq,
> struct task_struct *p, int flags) {
>  	dequeue_dl_entity(&p->dl);
>  	dequeue_pushable_dl_task(rq, p);
> +
> +	dec_nr_running(rq);
>  }
>  
>  static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
>  	update_curr_dl(rq);
>  	__dequeue_task_dl(rq, p, flags);
> -
> -	dec_nr_running(rq);
>  }
>  
>  /*

That patch is so line-wrapped... tskk!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/deadline: Fix bad accounting of nr_running
  2014-02-15  9:52 ` Peter Zijlstra
@ 2014-02-15 13:03   ` Steven Rostedt
  2014-02-15 13:08   ` [PATCH v2] " Steven Rostedt
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2014-02-15 13:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Juri Lelli, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	Andrew Morton

On Sat, 15 Feb 2014 10:52:07 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Feb 14, 2014 at 11:59:46PM -0500, Steven Rostedt wrote:
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 0dd5e09..84c2454 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -844,14 +844,14 @@ static void __dequeue_task_dl(struct rq *rq,
> > struct task_struct *p, int flags) {
> >  	dequeue_dl_entity(&p->dl);
> >  	dequeue_pushable_dl_task(rq, p);
> > +
> > +	dec_nr_running(rq);
> >  }
> >  
> >  static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> > flags) {
> >  	update_curr_dl(rq);
> >  	__dequeue_task_dl(rq, p, flags);
> > -
> > -	dec_nr_running(rq);
> >  }
> >  
> >  /*
> 
> That patch is so line-wrapped... tskk!

Damn it, That's claws mail. It inserts files (patches) fine, but if you
go and modify something, it decides to "autowrap" the entire file.
That's one of the things I need to fix when I get the time (I've
downloaded the source as this frustrates the hell out of me). The
autowrapping is the thing that I hate most about this client.

I'll send out a new patch later :-/

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] sched/deadline: Fix bad accounting of nr_running
  2014-02-15  9:52 ` Peter Zijlstra
  2014-02-15 13:03   ` Steven Rostedt
@ 2014-02-15 13:08   ` Steven Rostedt
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2014-02-15 13:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Juri Lelli, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	Andrew Morton

My test suite was locking up hard when enabling mmiotracer. This was due
to the mmiotracer placing all but one CPU offline. I found this out
when I was able to reproduce the bug with just my stress-cpu-hotplug
test. This bug baffled me because it would not always trigger, and
would only trigger on the first run after boot up. The
stress-cpu-hotplug test would crash hard the first run, or never crash
at all. But a new reboot may cause it to crash on the first run again.

I spent all week bisecting this, as I couldn't find a consistent
reproducer. I finally narrowed it down to the sched deadline patches,
and even more peculiar, to the commit that added the sched
deadline boot up self test to the latency tracer. Then it dawned on me
to what the bug was.

All it took was to run a task under sched deadline to screw up the CPU
hot plugging. This explained why it would lock up only on the first run
of the stress-cpu-hotplug test. The bug happened when the boot up self
test of the schedule latency tracer would test a deadline task. The
deadline task would corrupt something that would cause CPU hotplug to
fail. If it didn't corrupt it, the stress test would always work
(there's no other sched deadline tasks that would run to cause
problems). If it did corrupt on boot up, the first test would lockup
hard.

I proved this theory by running my deadline test program on another box,
and then run the stress-cpu-hotplug test, and it would now consistently
lock up. I could run stress-cpu-hotplug over and over with no problem,
but once I ran the deadline test, the next run of the
stress-cpu-hotplug would lock hard.

After adding lots of tracing to the code, I found the cause. The
function tracer showed that migrate_tasks() was stuck in an infinite
loop, where rq->nr_running never equaled 1 to break out of it. When I
added a trace_printk() to see what that number was, it was 335 and
never decrementing!

Looking at the deadline code I found:

static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
	dequeue_dl_entity(&p->dl);
	dequeue_pushable_dl_task(rq, p);
}

static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
	update_curr_dl(rq);
	__dequeue_task_dl(rq, p, flags);

	dec_nr_running(rq);
}

And this:

	if (dl_runtime_exceeded(rq, dl_se)) {
		__dequeue_task_dl(rq, curr, 0);
		if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
			dl_se->dl_throttled = 1;
		else
			enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);

		if (!is_leftmost(curr, &rq->dl))
			resched_task(curr);
	}

Notice how we call __dequeue_task_dl() and in the else case we
call enqueue_task_dl()? Also notice that dequeue_task_dl() has
underscores where enqueue_task_dl() does not. The enqueue_task_dl()
calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
where we get nr_running out of sync.

By moving the dec_nr_running() from dequeue_task_dl() to
__dequeue_task_dl(), everything works again. That is, I can run the
deadline test program and then run the stress-cpu-hotplug() and all
would be fine.

For reference on my test programs:

  http://rostedt.homelinux.com/private/stress-cpu-hotplug
  http://rostedt.homelinux.com/private/deadline.c

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---

v2. cleaned up claws mail line wrapping crap

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0dd5e09..84c2454 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -844,14 +844,14 @@ static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 {
 	dequeue_dl_entity(&p->dl);
 	dequeue_pushable_dl_task(rq, p);
+
+	dec_nr_running(rq);
 }
 
 static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 {
 	update_curr_dl(rq);
 	__dequeue_task_dl(rq, p, flags);
-
-	dec_nr_running(rq);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/deadline: Fix bad accounting of nr_running
  2014-02-15  4:59 [PATCH] sched/deadline: Fix bad accounting of nr_running Steven Rostedt
  2014-02-15  9:52 ` Peter Zijlstra
@ 2014-02-17 15:47 ` Juri Lelli
  2014-02-19  2:50   ` [PATCH v3] " Steven Rostedt
  1 sibling, 1 reply; 10+ messages in thread
From: Juri Lelli @ 2014-02-17 15:47 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	Andrew Morton

Hi,

On 02/15/2014 05:59 AM, Steven Rostedt wrote:
> My test suite was locking up hard when enabling mmiotracer. This was due
> to the mmiotracer placing all but one CPU offline. I found this out
> when I was able to reproduce the bug with just my stress-cpu-hotplug
> test. This bug baffled me because it would not always trigger, and
> would only trigger on the first run after boot up. The
> stress-cpu-hotplug test would crash hard the first run, or never crash
> at all. But a new reboot may cause it to crash on the first run again.
> 
> I spent all week bisecting this, as I couldn't find a consistent
> reproducer. I finally narrowed it down to the sched deadline patches,
> and even more peculiar, to the commit that added the sched
> deadline boot up self test to the latency tracer. Then it dawned on me
> to what the bug was.
> 
> All it took was to run a task under sched deadline to screw up the CPU
> hot plugging. This explained why it would lock up only on the first run
> of the stress-cpu-hotplug test. The bug happened when the boot up self
> test of the schedule latency tracer would test a deadline task. The
> deadline task would corrupt something that would cause CPU hotplug to
> fail. If it didn't corrupt it, the stress test would always work
> (there's no other sched deadline tasks that would run to cause
> problems). If it did corrupt on boot up, the first test would lockup
> hard.
> 
> I proved this theory by running my deadline test program on another box,
> and then run the stress-cpu-hotplug test, and it would now consistently
> lock up. I could run stress-cpu-hotplug over and over with no problem,
> but once I ran the deadline test, the next run of the
> stress-cpu-hotplug would lock hard.
> 
> After adding lots of tracing to the code, I found the cause. The
> function tracer showed that migrate_tasks() was stuck in an infinite
> loop, where rq->nr_running never equaled 1 to break out of it. When I
> added a trace_printk() to see what that number was, it was 335 and
> never decrementing!
> 
> Looking at the deadline code I found:
> 
> static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> 	dequeue_dl_entity(&p->dl);
> 	dequeue_pushable_dl_task(rq, p);
> }
> 
> static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
> 	update_curr_dl(rq);
> 	__dequeue_task_dl(rq, p, flags);
> 
> 	dec_nr_running(rq);
> }
> 
> And this:
> 
> 	if (dl_runtime_exceeded(rq, dl_se)) {
> 		__dequeue_task_dl(rq, curr, 0);
> 		if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
> 			dl_se->dl_throttled = 1;
> 		else
> 			enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
> 
> 		if (!is_leftmost(curr, &rq->dl))
> 			resched_task(curr);
> 	}
> 
> Notice how we call __dequeue_task_dl() and in the else case we
> call enqueue_task_dl()? Also notice that dequeue_task_dl() has
> underscores where enqueue_task_dl() does not. The enqueue_task_dl()
> calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
> where we get nr_running out of sync.
> 

Right. I'd add another place that could cause this misalignment:

static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
{
[snip]
        dl_se->dl_throttled = 0;
        if (p->on_rq) {
                enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
                if (task_has_dl_policy(rq->curr))
                        check_preempt_curr_dl(rq, p, 0);
                else
                        resched_task(rq->curr);
[snip]
}

This is called when the replenishment timer for a throttled task fires,
and we have to queue it back on the dl_rq with recharged parameters.
Best test for this bug is to run a while(1) task with SCHED_DEADLINE
(using for example https://github.com/jlelli/schedtool-dl). This causes
a lot of throttle/replenish events and causes nr_running to explode.
All is ok with this fix.

> By moving the dec_nr_running() from dequeue_task_dl() to
> __dequeue_task_dl(), everything works again. That is, I can run the
> deadline test program and then run the stress-cpu-hotplug() and all
> would be fine.
> 

Rationale for this odd behavior is that, when a task is throttled, it
is removed only from the dl_rq, but we keep it on_rq (as this is not
a "full dequeue", that is the task is not actually sleeping). But, it
is also true that, while throttled a task behaves like it is sleeping
(e.g., its timer will fire on a new CPU if the old one is dead). So,
Steven's fix sounds also semantically correct.

Thanks!

Best,

- Juri

> For reference on my test programs:
> 
>   http://rostedt.homelinux.com/private/stress-cpu-hotplug
>   http://rostedt.homelinux.com/private/deadline.c
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0dd5e09..84c2454 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -844,14 +844,14 @@ static void __dequeue_task_dl(struct rq *rq,
> struct task_struct *p, int flags) {
>  	dequeue_dl_entity(&p->dl);
>  	dequeue_pushable_dl_task(rq, p);
> +
> +	dec_nr_running(rq);
>  }
>  
>  static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int
> flags) {
>  	update_curr_dl(rq);
>  	__dequeue_task_dl(rq, p, flags);
> -
> -	dec_nr_running(rq);
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3] sched/deadline: Fix bad accounting of nr_running
  2014-02-17 15:47 ` [PATCH] " Juri Lelli
@ 2014-02-19  2:50   ` Steven Rostedt
  2014-02-19  8:46     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2014-02-19  2:50 UTC (permalink / raw)
  To: Juri Lelli
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar,
	Thomas Gleixner, Andrew Morton

 
> Rationale for this odd behavior is that, when a task is throttled, it
> is removed only from the dl_rq, but we keep it on_rq (as this is not
> a "full dequeue", that is the task is not actually sleeping). But, it
> is also true that, while throttled a task behaves like it is sleeping
> (e.g., its timer will fire on a new CPU if the old one is dead). So,
> Steven's fix sounds also semantically correct.

Actually, it seems that I was hitting it again, but this time getting a
negative number. OK, after looking at the code a bit more, I think we
should update the runqueue nr_running only when the task is officially
enqueued and dequeued, and all accounting within, will not touch that
number.

----
My test suite was locking up hard when enabling mmiotracer. This was due
to the mmiotracer placing all but one CPU offline. I found this out
when I was able to reproduce the bug with just my stress-cpu-hotplug
test. This bug baffled me because it would not always trigger, and
would only trigger on the first run after boot up. The
stress-cpu-hotplug test would crash hard the first run, or never crash
at all. But a new reboot may cause it to crash on the first run again.

I spent all week bisecting this, as I couldn't find a consistent
reproducer. I finally narrowed it down to the sched deadline patches,
and even more peculiar, to the commit that added the sched
deadline boot up self test to the latency tracer. Then it dawned on me
to what the bug was.

All it took was to run a task under sched deadline to screw up the CPU
hot plugging. This explained why it would lock up only on the first run
of the stress-cpu-hotplug test. The bug happened when the boot up self
test of the schedule latency tracer would test a deadline task. The
deadline task would corrupt something that would cause CPU hotplug to
fail. If it didn't corrupt it, the stress test would always work
(there's no other sched deadline tasks that would run to cause
problems). If it did corrupt on boot up, the first test would lockup
hard.

I proved this theory by running my deadline test program on another box,
and then run the stress-cpu-hotplug test, and it would now consistently
lock up. I could run stress-cpu-hotplug over and over with no problem,
but once I ran the deadline test, the next run of the
stress-cpu-hotplug would lock hard.

After adding lots of tracing to the code, I found the cause. The
function tracer showed that migrate_tasks() was stuck in an infinite
loop, where rq->nr_running never equaled 1 to break out of it. When I
added a trace_printk() to see what that number was, it was 335 and
never decrementing!

Looking at the deadline code I found:

static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
	dequeue_dl_entity(&p->dl);
	dequeue_pushable_dl_task(rq, p);
}

static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
	update_curr_dl(rq);
	__dequeue_task_dl(rq, p, flags);

	dec_nr_running(rq);
}

And this:

	if (dl_runtime_exceeded(rq, dl_se)) {
		__dequeue_task_dl(rq, curr, 0);
		if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
			dl_se->dl_throttled = 1;
		else
			enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);

		if (!is_leftmost(curr, &rq->dl))
			resched_task(curr);
	}

Notice how we call __dequeue_task_dl() and in the else case we
call enqueue_task_dl()? Also notice that dequeue_task_dl() has
underscores where enqueue_task_dl() does not. The enqueue_task_dl()
calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
where we get nr_running out of sync.

By only updating the nr_running when the task is officially enqueued or
dequeued by the core scheduler, we can prevent the accounting of that
number from going out of sync.

Link: http://lkml.kernel.org/r/20140214235946.60a89b65@gandalf.local.home

For reference on my test programs:

  http://rostedt.homelinux.com/private/stress-cpu-hotplug
  http://rostedt.homelinux.com/private/deadline.c

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
v3. Only update nr_running in enqueue and dequeue of core scheduler calls

v2. cleaned up claws mail line wrapping crap

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0dd5e09..27cb282 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -238,7 +238,7 @@ void dec_dl_migration(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
 
 #endif /* CONFIG_SMP */
 
-static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags);
+static void __enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags);
 static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags);
 static void check_preempt_curr_dl(struct rq *rq, struct task_struct *p,
 				  int flags);
@@ -510,7 +510,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 	update_rq_clock(rq);
 	dl_se->dl_throttled = 0;
 	if (p->on_rq) {
-		enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
+		__enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
 		if (task_has_dl_policy(rq->curr))
 			check_preempt_curr_dl(rq, p, 0);
 		else
@@ -608,7 +608,7 @@ static void update_curr_dl(struct rq *rq)
 		if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
 			dl_se->dl_throttled = 1;
 		else
-			enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
+			__enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
 
 		if (!is_leftmost(curr, &rq->dl))
 			resched_task(curr);
@@ -809,7 +809,7 @@ static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
 	__dequeue_dl_entity(dl_se);
 }
 
-static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
+static void __enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 {
 	struct task_struct *pi_task = rt_mutex_get_top_task(p);
 	struct sched_dl_entity *pi_se = &p->dl;
@@ -836,7 +836,11 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 
 	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
 		enqueue_pushable_dl_task(rq, p);
+}
 
+static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
+{
+	__enqueue_task_dl(rq, p, flags);
 	inc_nr_running(rq);
 }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/deadline: Fix bad accounting of nr_running
  2014-02-19  2:50   ` [PATCH v3] " Steven Rostedt
@ 2014-02-19  8:46     ` Peter Zijlstra
  2014-02-19 10:32       ` Juri Lelli
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2014-02-19  8:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Juri Lelli, LKML, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	Andrew Morton

On Tue, Feb 18, 2014 at 09:50:12PM -0500, Steven Rostedt wrote:
>  
> > Rationale for this odd behavior is that, when a task is throttled, it
> > is removed only from the dl_rq, but we keep it on_rq (as this is not
> > a "full dequeue", that is the task is not actually sleeping). But, it
> > is also true that, while throttled a task behaves like it is sleeping
> > (e.g., its timer will fire on a new CPU if the old one is dead). So,
> > Steven's fix sounds also semantically correct.
> 
> Actually, it seems that I was hitting it again, but this time getting a
> negative number. OK, after looking at the code a bit more, I think we
> should update the runqueue nr_running only when the task is officially
> enqueued and dequeued, and all accounting within, will not touch that
> number.

But if the task is throttled it should still very much decrement the
number. There's places that very much rely on nr_running be exactly the
number of runnable tasks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/deadline: Fix bad accounting of nr_running
  2014-02-19  8:46     ` Peter Zijlstra
@ 2014-02-19 10:32       ` Juri Lelli
  2014-02-19 13:14         ` Juri Lelli
  0 siblings, 1 reply; 10+ messages in thread
From: Juri Lelli @ 2014-02-19 10:32 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: LKML, Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andrew Morton

On 02/19/2014 09:46 AM, Peter Zijlstra wrote:
> On Tue, Feb 18, 2014 at 09:50:12PM -0500, Steven Rostedt wrote:
>>  
>>> Rationale for this odd behavior is that, when a task is throttled, it
>>> is removed only from the dl_rq, but we keep it on_rq (as this is not
>>> a "full dequeue", that is the task is not actually sleeping). But, it
>>> is also true that, while throttled a task behaves like it is sleeping
>>> (e.g., its timer will fire on a new CPU if the old one is dead). So,
>>> Steven's fix sounds also semantically correct.
>>
>> Actually, it seems that I was hitting it again, but this time getting a
>> negative number. OK, after looking at the code a bit more, I think we
>> should update the runqueue nr_running only when the task is officially
>> enqueued and dequeued, and all accounting within, will not touch that
>> number.

This is a different way to get the same result (mildly tested on my box):

---
 kernel/sched/deadline.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0dd5e09..675dad3 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -837,7 +837,8 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
        if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
                enqueue_pushable_dl_task(rq, p);

-       inc_nr_running(rq);
+       if (!(flags & ENQUEUE_REPLENISH))
+               inc_nr_running(rq);
 }

 static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
--

We touch nr_running only when we don't enqueue back as a consequence
of a replenishment.

> 
> But if the task is throttled it should still very much decrement the
> number. There's places that very much rely on nr_running be exactly the
> number of runnable tasks.
> 

This is a different thing, and V2 seemed to implement this behavior
(that's why I said it looked semantically correct).

Thanks,

- Juri

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/deadline: Fix bad accounting of nr_running
  2014-02-19 10:32       ` Juri Lelli
@ 2014-02-19 13:14         ` Juri Lelli
  2014-02-19 17:45           ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Juri Lelli @ 2014-02-19 13:14 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: LKML, Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andrew Morton

On 02/19/2014 11:32 AM, Juri Lelli wrote:
> On 02/19/2014 09:46 AM, Peter Zijlstra wrote:
>> On Tue, Feb 18, 2014 at 09:50:12PM -0500, Steven Rostedt wrote:
>>>  
>>>> Rationale for this odd behavior is that, when a task is throttled, it
>>>> is removed only from the dl_rq, but we keep it on_rq (as this is not
>>>> a "full dequeue", that is the task is not actually sleeping). But, it
>>>> is also true that, while throttled a task behaves like it is sleeping
>>>> (e.g., its timer will fire on a new CPU if the old one is dead). So,
>>>> Steven's fix sounds also semantically correct.
>>>
>>> Actually, it seems that I was hitting it again, but this time getting a
>>> negative number. OK, after looking at the code a bit more, I think we
>>> should update the runqueue nr_running only when the task is officially
>>> enqueued and dequeued, and all accounting within, will not touch that
>>> number.
> 
> This is a different way to get the same result (mildly tested on my box):
> 
> ---
>  kernel/sched/deadline.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0dd5e09..675dad3 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -837,7 +837,8 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
>         if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
>                 enqueue_pushable_dl_task(rq, p);
> 
> -       inc_nr_running(rq);
> +       if (!(flags & ENQUEUE_REPLENISH))
> +               inc_nr_running(rq);
>  }
> 
>  static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
> --
> 
> We touch nr_running only when we don't enqueue back as a consequence
> of a replenishment.
> 
>>
>> But if the task is throttled it should still very much decrement the
>> number. There's places that very much rely on nr_running be exactly the
>> number of runnable tasks.
>>
> 
> This is a different thing, and V2 seemed to implement this behavior
> (that's why I said it looked semantically correct).
> 

So, both my last approach and Steven's V2 were causing nr_running to
become negative, as they double decrement it when dequeuing a task that
also exceeded its budget.

What follows seems to solve the issue, and correcly account for throttled
tasks as !nr_running.

---
 kernel/sched/deadline.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0dd5e09..b819577 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -717,6 +717,7 @@ void inc_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)

        WARN_ON(!dl_prio(prio));
        dl_rq->dl_nr_running++;
+       inc_nr_running(rq_of_dl_rq(dl_rq));

        inc_dl_deadline(dl_rq, deadline);
        inc_dl_migration(dl_se, dl_rq);
@@ -730,6 +731,7 @@ void dec_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
        WARN_ON(!dl_prio(prio));
        WARN_ON(!dl_rq->dl_nr_running);
        dl_rq->dl_nr_running--;
+       dec_nr_running(rq_of_dl_rq(dl_rq));

        dec_dl_deadline(dl_rq, dl_se->deadline);
        dec_dl_migration(dl_se, dl_rq);
@@ -836,8 +838,6 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)

        if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
                enqueue_pushable_dl_task(rq, p);
-
-       inc_nr_running(rq);
 }

 static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
@@ -850,8 +850,6 @@ static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 {
        update_curr_dl(rq);
        __dequeue_task_dl(rq, p, flags);
-
-       dec_nr_running(rq);
 }

 /*
-- 
1.7.9.5

Steven, could you test it?

Thanks,

- Juri

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/deadline: Fix bad accounting of nr_running
  2014-02-19 13:14         ` Juri Lelli
@ 2014-02-19 17:45           ` Steven Rostedt
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2014-02-19 17:45 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, LKML, Linus Torvalds, Ingo Molnar,
	Thomas Gleixner, Andrew Morton

On Wed, 19 Feb 2014 14:14:54 +0100
Juri Lelli <juri.lelli@gmail.com> wrote:

> Steven, could you test it?

OK, I tested it on my x86_64 box that was also causing troubles.

Please fix up the whitespace issues and send a formal patch to Peter.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Added-bys-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-02-19 17:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-15  4:59 [PATCH] sched/deadline: Fix bad accounting of nr_running Steven Rostedt
2014-02-15  9:52 ` Peter Zijlstra
2014-02-15 13:03   ` Steven Rostedt
2014-02-15 13:08   ` [PATCH v2] " Steven Rostedt
2014-02-17 15:47 ` [PATCH] " Juri Lelli
2014-02-19  2:50   ` [PATCH v3] " Steven Rostedt
2014-02-19  8:46     ` Peter Zijlstra
2014-02-19 10:32       ` Juri Lelli
2014-02-19 13:14         ` Juri Lelli
2014-02-19 17:45           ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.