Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
@ 2012-08-16 13:45 Rakib Mullick
  2012-08-16 13:56 ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Rakib Mullick @ 2012-08-16 13:45 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel

 
 When a CPU is about to go down, it moves all it's sleeping task to an active CPU, then nr_uninterruptible counts are
also moved. When moving nr_uninterruptible count, currently it chooses a randomly picked CPU from the active CPU mask
to keep the global nr_uninterruptible count intact. But, it would be precise to move nr_uninterruptible counts to the
CPU where all the sleeping tasks were moved and it also might have subtle impact over rq's load calculation. So, this
patch is prepared to address this issue.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
---

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 82ad284..5839796 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5304,9 +5304,9 @@ void idle_task_exit(void)
  * their home CPUs. So we just add the counter to another CPU's counter,
  * to keep the global sum constant after CPU-down:
  */
-static void migrate_nr_uninterruptible(struct rq *rq_src)
+static void migrate_nr_uninterruptible(struct rq *rq_src, unsigned int dest_cpu)
 {
-	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
+	struct rq *rq_dest = cpu_rq(dest_cpu);
 
 	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
 	rq_src->nr_uninterruptible = 0;
@@ -5371,6 +5371,7 @@ static void migrate_tasks(unsigned int dead_cpu)
 	}
 
 	rq->stop = stop;
+	migrate_nr_uninterruptible(rq, dest_cpu);
 }
 
 #endif /* CONFIG_HOTPLUG_CPU */
@@ -5612,7 +5613,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
 
-		migrate_nr_uninterruptible(rq);
 		calc_global_load_remove(rq);
 		break;
 #endif



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-16 13:45 Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down Rakib Mullick
@ 2012-08-16 13:56 ` Peter Zijlstra
  2012-08-16 14:28   ` Rakib Mullick
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2012-08-16 13:56 UTC (permalink / raw)
  To: Rakib Mullick; +Cc: mingo, linux-kernel, paulmck

On Thu, 2012-08-16 at 19:45 +0600, Rakib Mullick wrote:
> When a CPU is about to go down, it moves all it's sleeping task to an active CPU, then nr_uninterruptible counts are
> also moved. When moving nr_uninterruptible count, currently it chooses a randomly picked CPU from the active CPU mask
> to keep the global nr_uninterruptible count intact. But, it would be precise to move nr_uninterruptible counts to the
> CPU where all the sleeping tasks were moved and it also might have subtle impact over rq's load calculation. So, this
> patch is prepared to address this issue.

The Changelog is ill-formated. Other than that, the patch doesn't appear
to actually do what it says. The sleeping tasks can be scattered to any
number of cpus as decided by select_fallback_rq().

Furthermore there should be absolutely no impact on load calculation
what so ever. nr_uninterruptible is only ever useful as a sum over all
cpus, this total sum doesn't change regardless of where you put the
value.

Worse, there's absolutely no relation to the tasks on the runqueue
(sleeping or otherwise) and nr_uninterruptible, so coupling these
actions makes no sense what so ever.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-16 13:56 ` Peter Zijlstra
@ 2012-08-16 14:28   ` Rakib Mullick
  2012-08-16 14:42     ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Rakib Mullick @ 2012-08-16 14:28 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulmck

On 8/16/12, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2012-08-16 at 19:45 +0600, Rakib Mullick wrote:
>> When a CPU is about to go down, it moves all it's sleeping task to an
>> active CPU, then nr_uninterruptible counts are
>> also moved. When moving nr_uninterruptible count, currently it chooses a
>> randomly picked CPU from the active CPU mask
>> to keep the global nr_uninterruptible count intact. But, it would be
>> precise to move nr_uninterruptible counts to the
>> CPU where all the sleeping tasks were moved and it also might have subtle
>> impact over rq's load calculation. So, this
>> patch is prepared to address this issue.
>
> The Changelog is ill-formated. Other than that, the patch doesn't appear
> to actually do what it says. The sleeping tasks can be scattered to any
> number of cpus as decided by select_fallback_rq().
>
I'm not sure which parts are missing from Changelog to patch. And this
patch assumes that, sleeping tasks won't be scattered. From
select_fallback_rq(), sleeping tasks might get scattered due to
various cases like. if CPU is down, task isn't allowed to move a
particular CPU. Other than that, dest cpu supposed to be the same.

> Furthermore there should be absolutely no impact on load calculation
> what so ever. nr_uninterruptible is only ever useful as a sum over all
> cpus, this total sum doesn't change regardless of where you put the
> value.
>
> Worse, there's absolutely no relation to the tasks on the runqueue
> (sleeping or otherwise) and nr_uninterruptible, so coupling these
> actions makes no sense what so ever.
>
nr_uninterruptible is coupled with tasks on the runqueue to calculate
nr_active numbers.
In calc_load_fold_active(), this nr_active numbers are used to
calculate delta. This is how I understand this part and seeing some
impact.

Thanks,
Rakib

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-16 14:28   ` Rakib Mullick
@ 2012-08-16 14:42     ` Peter Zijlstra
  2012-08-16 15:32       ` Rakib Mullick
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2012-08-16 14:42 UTC (permalink / raw)
  To: Rakib Mullick; +Cc: mingo, linux-kernel, paulmck

On Thu, 2012-08-16 at 20:28 +0600, Rakib Mullick wrote:

> I'm not sure which parts are missing from Changelog to patch. And this
> patch assumes that, sleeping tasks won't be scattered. From
> select_fallback_rq(), sleeping tasks might get scattered due to
> various cases like. if CPU is down, task isn't allowed to move a
> particular CPU. Other than that, dest cpu supposed to be the same.

Sure but affinities and cpusets can still scatter, and therefore your
logic doesn't hold up, but see below.

> > Furthermore there should be absolutely no impact on load calculation
> > what so ever. nr_uninterruptible is only ever useful as a sum over all
> > cpus, this total sum doesn't change regardless of where you put the
> > value.
> >
> > Worse, there's absolutely no relation to the tasks on the runqueue
> > (sleeping or otherwise) and nr_uninterruptible, so coupling these
> > actions makes no sense what so ever.
> >
> nr_uninterruptible is coupled with tasks on the runqueue to calculate
> nr_active numbers.

It is not.. nr_uninterruptible is incremented on the cpu the task goes
to sleep and decremented on the cpu doing the wakeup.

This means that nr_uninterruptible is a complete mess and any per-cpu
value isn't meaningful at all.

It is quite possible to always have the inc on cpu0 and the decrement on
cpu1, yielding results like:

{1000, -1000} for an effective nr_uninterruptible = 0. Taking either cpu
down will then migrate whatever delta it has to another cpu, but there
might only be a single task, yet the delta is +-1000.

> In calc_load_fold_active(), this nr_active numbers are used to
> calculate delta. This is how I understand this part and seeing some
> impact.

You understand wrong, please re-read the comment added in commit
5167e8d5.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-16 14:42     ` Peter Zijlstra
@ 2012-08-16 15:32       ` Rakib Mullick
  2012-08-16 17:46         ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Rakib Mullick @ 2012-08-16 15:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulmck

On 8/16/12, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2012-08-16 at 20:28 +0600, Rakib Mullick wrote:
>
>> nr_uninterruptible is coupled with tasks on the runqueue to calculate
>> nr_active numbers.
>
> It is not.. nr_uninterruptible is incremented on the cpu the task goes
> to sleep and decremented on the cpu doing the wakeup.
>
If nr_uninterruptible's life cycle is this simple then,  while CPU
goes down, nr_uninterruptible count will be decremented when all the
tasks are moved to other CPUs and should be fine.

> This means that nr_uninterruptible is a complete mess and any per-cpu
> value isn't meaningful at all.
>
Well, if nr_uninterruptible is a mess, then this patch has no meaning.
And also I think migrate_nr_uninterruptible() is meaning less too.

> It is quite possible to always have the inc on cpu0 and the decrement on
> cpu1, yielding results like:
>
> {1000, -1000} for an effective nr_uninterruptible = 0. Taking either cpu
> down will then migrate whatever delta it has to another cpu, but there
> might only be a single task, yet the delta is +-1000.
>
>> In calc_load_fold_active(), this nr_active numbers are used to
>> calculate delta. This is how I understand this part and seeing some
>> impact.
>
> You understand wrong, please re-read the comment added in commit
> 5167e8d5.
>
Yes, reading.

Thanks,
Rakib.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-16 15:32       ` Rakib Mullick
@ 2012-08-16 17:46         ` Peter Zijlstra
  2012-08-17 13:39           ` Rakib Mullick
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2012-08-16 17:46 UTC (permalink / raw)
  To: Rakib Mullick; +Cc: mingo, linux-kernel, paulmck

On Thu, 2012-08-16 at 21:32 +0600, Rakib Mullick wrote:
> And also I think migrate_nr_uninterruptible() is meaning less too.

Hmm, I think I see a problem.. we forget to migrate the effective delta
created by rq->calc_load_active.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-16 17:46         ` Peter Zijlstra
@ 2012-08-17 13:39           ` Rakib Mullick
  2012-08-20  9:26             ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Rakib Mullick @ 2012-08-17 13:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulmck

On 8/16/12, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2012-08-16 at 21:32 +0600, Rakib Mullick wrote:
>> And also I think migrate_nr_uninterruptible() is meaning less too.
>
> Hmm, I think I see a problem.. we forget to migrate the effective delta
> created by rq->calc_load_active.
>
And rq->calc_load_active needs to be migrated to the proper dest_rq
not like currently picking any random rq.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-17 13:39           ` Rakib Mullick
@ 2012-08-20  9:26             ` Peter Zijlstra
  2012-08-20 16:10               ` Rakib Mullick
                                 ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Peter Zijlstra @ 2012-08-20  9:26 UTC (permalink / raw)
  To: Rakib Mullick; +Cc: mingo, linux-kernel, paulmck

On Fri, 2012-08-17 at 19:39 +0600, Rakib Mullick wrote:
> On 8/16/12, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2012-08-16 at 21:32 +0600, Rakib Mullick wrote:
> >> And also I think migrate_nr_uninterruptible() is meaning less too.
> >
> > Hmm, I think I see a problem.. we forget to migrate the effective delta
> > created by rq->calc_load_active.
> >
> And rq->calc_load_active needs to be migrated to the proper dest_rq
> not like currently picking any random rq.


OK, so how about something like the below, it would also solve Paul's
issue with that code.


Please do double check the logic, I've had all of 4 hours sleep and its
far too warm for a brain to operate in any case.

---
Subject: sched: Fix load avg vs cpu-hotplug

Rabik and Paul reported two different issues related to the same few
lines of code.

Rabik's issue is that the nr_uninterruptible migration code is wrong in
that he sees artifacts due to this (Rabik please do expand in more
detail).

Paul's issue is that this code as it stands relies on us using
stop_machine() for unplug, we all would like to remove this assumption
so that eventually we can remove this stop_machine() usage altogether.

The only reason we'd have to migrate nr_uninterruptible is so that we
could use for_each_online_cpu() loops in favour of
for_each_possible_cpu() loops, however since nr_uninterruptible() is the
only such loop and its using possible lets not bother at all.

The problem Rabik sees is (probably) caused by the fact that by
migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
involved.

So don't bother with fancy migration schemes (meaning we now have to
keep using for_each_possible_cpu()) and instead fold any nr_active delta
after we migrate all tasks away to make sure we don't have any skewed
nr_active accounting.


Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched/core.c | 31 ++++++++++---------------------
 1 file changed, 10 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4376c9f..06d23c6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5338,27 +5338,17 @@ void idle_task_exit(void)
 }
 
 /*
- * While a dead CPU has no uninterruptible tasks queued at this point,
- * it might still have a nonzero ->nr_uninterruptible counter, because
- * for performance reasons the counter is not stricly tracking tasks to
- * their home CPUs. So we just add the counter to another CPU's counter,
- * to keep the global sum constant after CPU-down:
- */
-static void migrate_nr_uninterruptible(struct rq *rq_src)
-{
-	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
-
-	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
-	rq_src->nr_uninterruptible = 0;
-}
-
-/*
- * remove the tasks which were accounted by rq from calc_load_tasks.
+ * Since this CPU is going 'away' for a while, fold any nr_active delta
+ * we might have. Assumes we're called after migrate_tasks() so that the
+ * nr_active count is stable.
+ *
+ * Also see the comment "Global load-average calculations".
  */
-static void calc_global_load_remove(struct rq *rq)
+static void calc_load_migrate(struct rq *rq)
 {
-	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
-	rq->calc_load_active = 0;
+	long delta = calc_load_fold_active(rq);
+	if (delta)
+		atomic_long_add(delta, &calc_load_tasks);
 }
 
 /*
@@ -5652,8 +5642,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
 
-		migrate_nr_uninterruptible(rq);
-		calc_global_load_remove(rq);
+		calc_load_migrate(rq);
 		break;
 #endif
 	}


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-20  9:26             ` Peter Zijlstra
@ 2012-08-20 16:10               ` Rakib Mullick
  2012-08-20 16:16                 ` Peter Zijlstra
  2012-08-20 16:26               ` Paul E. McKenney
  2012-09-04 18:43               ` [tip:sched/core] sched: Fix load avg vs cpu-hotplug tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 26+ messages in thread
From: Rakib Mullick @ 2012-08-20 16:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, paulmck

On 8/20/12, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2012-08-17 at 19:39 +0600, Rakib Mullick wrote:
>> On 8/16/12, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, 2012-08-16 at 21:32 +0600, Rakib Mullick wrote:
>> >> And also I think migrate_nr_uninterruptible() is meaning less too.
>> >
>> > Hmm, I think I see a problem.. we forget to migrate the effective delta
>> > created by rq->calc_load_active.
>> >
>> And rq->calc_load_active needs to be migrated to the proper dest_rq
>> not like currently picking any random rq.
>
>
> OK, so how about something like the below, it would also solve Paul's
> issue with that code.
>
>
> Please do double check the logic, I've had all of 4 hours sleep and its
> far too warm for a brain to operate in any case.
>
> ---
> Subject: sched: Fix load avg vs cpu-hotplug
>
> Rabik and Paul reported two different issues related to the same few
> lines of code.
>
First of all, you've misspelled my name, it's Rakib not Rabik.

> Rabik's issue is that the nr_uninterruptible migration code is wrong in
> that he sees artifacts due to this (Rabik please do expand in more
> detail).
>
Okay, I was thinking about per rq->nr_uninterruptible accounting due
to it's use in delta calculation at calc_load_fold_active(). So, I
proposed to migrate nr_uninterruptible to the rq, where tasks were
migrated as if, they gets folded from update_cpu_load(). Now, note
that, delta is calculated using rq->nr_running and
rq->nr_uninterruptible, so if we migrate tasks into a rq, but migrate
nr_uninterruptible into another rq, its wrong and we're screwing the
delta calculation.

> Paul's issue is that this code as it stands relies on us using
> stop_machine() for unplug, we all would like to remove this assumption
> so that eventually we can remove this stop_machine() usage altogether.
>
> The only reason we'd have to migrate nr_uninterruptible is so that we
> could use for_each_online_cpu() loops in favour of
> for_each_possible_cpu() loops, however since nr_uninterruptible() is the
> only such loop and its using possible lets not bother at all.
>
> The problem Rabik sees is (probably) caused by the fact that by
> migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
> involved.
>
Certainly, we don't care about the rq which is going down. But, the dest_rq.

> So don't bother with fancy migration schemes (meaning we now have to
> keep using for_each_possible_cpu()) and instead fold any nr_active delta
> after we migrate all tasks away to make sure we don't have any skewed
> nr_active accounting.
>
>
> Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
> Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/sched/core.c | 31 ++++++++++---------------------
>  1 file changed, 10 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4376c9f..06d23c6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5338,27 +5338,17 @@ void idle_task_exit(void)
>  }
>
>  /*
> - * While a dead CPU has no uninterruptible tasks queued at this point,
> - * it might still have a nonzero ->nr_uninterruptible counter, because
> - * for performance reasons the counter is not stricly tracking tasks to
> - * their home CPUs. So we just add the counter to another CPU's counter,
> - * to keep the global sum constant after CPU-down:
> - */
> -static void migrate_nr_uninterruptible(struct rq *rq_src)
> -{
> -	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
> -
> -	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
> -	rq_src->nr_uninterruptible = 0;
> -}
> -
> -/*
> - * remove the tasks which were accounted by rq from calc_load_tasks.
> + * Since this CPU is going 'away' for a while, fold any nr_active delta
> + * we might have. Assumes we're called after migrate_tasks() so that the
> + * nr_active count is stable.
> + *
But after migrate_tasks(), it's likely that rq->nr_running will be 1.
Then, nr_active will be screwed. No?

> + * Also see the comment "Global load-average calculations".
>   */
> -static void calc_global_load_remove(struct rq *rq)
> +static void calc_load_migrate(struct rq *rq)
>  {
> -	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
> -	rq->calc_load_active = 0;
> +	long delta = calc_load_fold_active(rq);
> +	if (delta)
> +		atomic_long_add(delta, &calc_load_tasks);
>  }
>
>  /*
> @@ -5652,8 +5642,7 @@ migration_call(struct notifier_block *nfb, unsigned
> long action, void *hcpu)
>  		BUG_ON(rq->nr_running != 1); /* the migration thread */
>  		raw_spin_unlock_irqrestore(&rq->lock, flags);
>
> -		migrate_nr_uninterruptible(rq);
> -		calc_global_load_remove(rq);
> +		calc_load_migrate(rq);
>  		break;
>  #endif
>  	}
>


Thanks,
Rakib

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-20 16:10               ` Rakib Mullick
@ 2012-08-20 16:16                 ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2012-08-20 16:16 UTC (permalink / raw)
  To: Rakib Mullick; +Cc: mingo, linux-kernel, paulmck

On Mon, 2012-08-20 at 22:10 +0600, Rakib Mullick wrote:
> >
> First of all, you've misspelled my name, it's Rakib not Rabik.

Damn, sorry!, lysdexic that.. 

> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 4376c9f..06d23c6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5338,27 +5338,17 @@ void idle_task_exit(void)
> >  }
> >
> >  /*
> > - * While a dead CPU has no uninterruptible tasks queued at this point,
> > - * it might still have a nonzero ->nr_uninterruptible counter, because
> > - * for performance reasons the counter is not stricly tracking tasks to
> > - * their home CPUs. So we just add the counter to another CPU's counter,
> > - * to keep the global sum constant after CPU-down:
> > - */
> > -static void migrate_nr_uninterruptible(struct rq *rq_src)
> > -{
> > -     struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
> > -
> > -     rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
> > -     rq_src->nr_uninterruptible = 0;
> > -}
> > -
> > -/*
> > - * remove the tasks which were accounted by rq from calc_load_tasks.
> > + * Since this CPU is going 'away' for a while, fold any nr_active delta
> > + * we might have. Assumes we're called after migrate_tasks() so that the
> > + * nr_active count is stable.
> > + *
> But after migrate_tasks(), it's likely that rq->nr_running will be 1.
> Then, nr_active will be screwed. No? 

Gah indeed. let me try that again.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-20  9:26             ` Peter Zijlstra
  2012-08-20 16:10               ` Rakib Mullick
@ 2012-08-20 16:26               ` Paul E. McKenney
  2012-08-27 18:44                 ` Paul E. McKenney
  2012-09-04 18:43               ` [tip:sched/core] sched: Fix load avg vs cpu-hotplug tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2012-08-20 16:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Rakib Mullick, mingo, linux-kernel

On Mon, Aug 20, 2012 at 11:26:57AM +0200, Peter Zijlstra wrote:
> On Fri, 2012-08-17 at 19:39 +0600, Rakib Mullick wrote:
> > On 8/16/12, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Thu, 2012-08-16 at 21:32 +0600, Rakib Mullick wrote:
> > >> And also I think migrate_nr_uninterruptible() is meaning less too.
> > >
> > > Hmm, I think I see a problem.. we forget to migrate the effective delta
> > > created by rq->calc_load_active.
> > >
> > And rq->calc_load_active needs to be migrated to the proper dest_rq
> > not like currently picking any random rq.
> 
> 
> OK, so how about something like the below, it would also solve Paul's
> issue with that code.
> 
> 
> Please do double check the logic, I've had all of 4 hours sleep and its
> far too warm for a brain to operate in any case.
> 
> ---
> Subject: sched: Fix load avg vs cpu-hotplug
> 
> Rabik and Paul reported two different issues related to the same few
> lines of code.
> 
> Rabik's issue is that the nr_uninterruptible migration code is wrong in
> that he sees artifacts due to this (Rabik please do expand in more
> detail).
> 
> Paul's issue is that this code as it stands relies on us using
> stop_machine() for unplug, we all would like to remove this assumption
> so that eventually we can remove this stop_machine() usage altogether.
> 
> The only reason we'd have to migrate nr_uninterruptible is so that we
> could use for_each_online_cpu() loops in favour of
> for_each_possible_cpu() loops, however since nr_uninterruptible() is the
> only such loop and its using possible lets not bother at all.
> 
> The problem Rabik sees is (probably) caused by the fact that by
> migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
> involved.
> 
> So don't bother with fancy migration schemes (meaning we now have to
> keep using for_each_possible_cpu()) and instead fold any nr_active delta
> after we migrate all tasks away to make sure we don't have any skewed
> nr_active accounting.
> 
> 
> Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
> Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/sched/core.c | 31 ++++++++++---------------------
>  1 file changed, 10 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4376c9f..06d23c6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5338,27 +5338,17 @@ void idle_task_exit(void)
>  }
>  
>  /*
> - * While a dead CPU has no uninterruptible tasks queued at this point,
> - * it might still have a nonzero ->nr_uninterruptible counter, because
> - * for performance reasons the counter is not stricly tracking tasks to
> - * their home CPUs. So we just add the counter to another CPU's counter,
> - * to keep the global sum constant after CPU-down:
> - */
> -static void migrate_nr_uninterruptible(struct rq *rq_src)
> -{
> -	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
> -
> -	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
> -	rq_src->nr_uninterruptible = 0;
> -}
> -
> -/*
> - * remove the tasks which were accounted by rq from calc_load_tasks.
> + * Since this CPU is going 'away' for a while, fold any nr_active delta
> + * we might have. Assumes we're called after migrate_tasks() so that the
> + * nr_active count is stable.
> + *
> + * Also see the comment "Global load-average calculations".
>   */
> -static void calc_global_load_remove(struct rq *rq)
> +static void calc_load_migrate(struct rq *rq)
>  {
> -	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
> -	rq->calc_load_active = 0;
> +	long delta = calc_load_fold_active(rq);
> +	if (delta)
> +		atomic_long_add(delta, &calc_load_tasks);
>  }
>  
>  /*
> @@ -5652,8 +5642,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  		BUG_ON(rq->nr_running != 1); /* the migration thread */
>  		raw_spin_unlock_irqrestore(&rq->lock, flags);
>  
> -		migrate_nr_uninterruptible(rq);
> -		calc_global_load_remove(rq);
> +		calc_load_migrate(rq);

Not sure that it matters, but...

This is called from the CPU_DYING notifier, which runs with irqs
disabled, but in process context.  As I understand it, this means that
->nr_running==1.  If my understanding is correct (ha!), this means that
this change sets ->calc_load_active to one (rather than zero as in the
original) and that it subtracts one fewer from calc_load_tasks than did
the original.  Of course, I have no idea whether this matters.

If I am correct and if it does matter, one straightforward fix
is to add a "CPU_DEAD" branch to the switch statement and move the
"calc_load_migrate(rq)" to that new branch.  Given that "rq" references
the outgoing CPU, my guess is that locking is not needed, but you would
know better than I.

							Thanx, Paul

>  		break;
>  #endif
>  	}
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-20 16:26               ` Paul E. McKenney
@ 2012-08-27 18:44                 ` Paul E. McKenney
  2012-08-28  6:57                   ` Rakib Mullick
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2012-08-27 18:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Rakib Mullick, mingo, linux-kernel

On Mon, Aug 20, 2012 at 09:26:57AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 20, 2012 at 11:26:57AM +0200, Peter Zijlstra wrote:

[ . . . ]

> > OK, so how about something like the below, it would also solve Paul's
> > issue with that code.
> > 
> > 
> > Please do double check the logic, I've had all of 4 hours sleep and its
> > far too warm for a brain to operate in any case.
> > 
> > ---
> > Subject: sched: Fix load avg vs cpu-hotplug
> > 
> > Rabik and Paul reported two different issues related to the same few
> > lines of code.
> > 
> > Rabik's issue is that the nr_uninterruptible migration code is wrong in
> > that he sees artifacts due to this (Rabik please do expand in more
> > detail).
> > 
> > Paul's issue is that this code as it stands relies on us using
> > stop_machine() for unplug, we all would like to remove this assumption
> > so that eventually we can remove this stop_machine() usage altogether.
> > 
> > The only reason we'd have to migrate nr_uninterruptible is so that we
> > could use for_each_online_cpu() loops in favour of
> > for_each_possible_cpu() loops, however since nr_uninterruptible() is the
> > only such loop and its using possible lets not bother at all.
> > 
> > The problem Rabik sees is (probably) caused by the fact that by
> > migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
> > involved.
> > 
> > So don't bother with fancy migration schemes (meaning we now have to
> > keep using for_each_possible_cpu()) and instead fold any nr_active delta
> > after we migrate all tasks away to make sure we don't have any skewed
> > nr_active accounting.
> > 
> > 
> > Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
> > Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> >  kernel/sched/core.c | 31 ++++++++++---------------------
> >  1 file changed, 10 insertions(+), 21 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 4376c9f..06d23c6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5338,27 +5338,17 @@ void idle_task_exit(void)
> >  }
> >  
> >  /*
> > - * While a dead CPU has no uninterruptible tasks queued at this point,
> > - * it might still have a nonzero ->nr_uninterruptible counter, because
> > - * for performance reasons the counter is not stricly tracking tasks to
> > - * their home CPUs. So we just add the counter to another CPU's counter,
> > - * to keep the global sum constant after CPU-down:
> > - */
> > -static void migrate_nr_uninterruptible(struct rq *rq_src)
> > -{
> > -	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
> > -
> > -	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
> > -	rq_src->nr_uninterruptible = 0;
> > -}
> > -
> > -/*
> > - * remove the tasks which were accounted by rq from calc_load_tasks.
> > + * Since this CPU is going 'away' for a while, fold any nr_active delta
> > + * we might have. Assumes we're called after migrate_tasks() so that the
> > + * nr_active count is stable.
> > + *
> > + * Also see the comment "Global load-average calculations".
> >   */
> > -static void calc_global_load_remove(struct rq *rq)
> > +static void calc_load_migrate(struct rq *rq)
> >  {
> > -	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
> > -	rq->calc_load_active = 0;
> > +	long delta = calc_load_fold_active(rq);
> > +	if (delta)
> > +		atomic_long_add(delta, &calc_load_tasks);
> >  }
> >  
> >  /*
> > @@ -5652,8 +5642,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
> >  		BUG_ON(rq->nr_running != 1); /* the migration thread */
> >  		raw_spin_unlock_irqrestore(&rq->lock, flags);
> >  
> > -		migrate_nr_uninterruptible(rq);
> > -		calc_global_load_remove(rq);
> > +		calc_load_migrate(rq);
> 
> Not sure that it matters, but...
> 
> This is called from the CPU_DYING notifier, which runs with irqs
> disabled, but in process context.  As I understand it, this means that
> ->nr_running==1.  If my understanding is correct (ha!), this means that
> this change sets ->calc_load_active to one (rather than zero as in the
> original) and that it subtracts one fewer from calc_load_tasks than did
> the original.  Of course, I have no idea whether this matters.
> 
> If I am correct and if it does matter, one straightforward fix
> is to add a "CPU_DEAD" branch to the switch statement and move the
> "calc_load_migrate(rq)" to that new branch.  Given that "rq" references
> the outgoing CPU, my guess is that locking is not needed, but you would
> know better than I.

How about the following updated patch?

							Thanx, Paul

------------------------------------------------------------------------

sched: Fix load avg vs cpu-hotplug

Rabik and Paul reported two different issues related to the same few
lines of code.

Rabik's issue is that the nr_uninterruptible migration code is wrong in
that he sees artifacts due to this (Rabik please do expand in more
detail).

Paul's issue is that this code as it stands relies on us using
stop_machine() for unplug, we all would like to remove this assumption
so that eventually we can remove this stop_machine() usage altogether.

The only reason we'd have to migrate nr_uninterruptible is so that we
could use for_each_online_cpu() loops in favour of
for_each_possible_cpu() loops, however since nr_uninterruptible() is the
only such loop and its using possible lets not bother at all.

The problem Rabik sees is (probably) caused by the fact that by
migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
involved.

So don't bother with fancy migration schemes (meaning we now have to
keep using for_each_possible_cpu()) and instead fold any nr_active delta
after we migrate all tasks away to make sure we don't have any skewed
nr_active accounting.

[ paulmck: Move call to calc_load_migration to CPU_DEAD to avoid
miscounting noted by Rakib. ]

Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e841dfc..a8807f2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5309,27 +5309,17 @@ void idle_task_exit(void)
 }
 
 /*
- * While a dead CPU has no uninterruptible tasks queued at this point,
- * it might still have a nonzero ->nr_uninterruptible counter, because
- * for performance reasons the counter is not stricly tracking tasks to
- * their home CPUs. So we just add the counter to another CPU's counter,
- * to keep the global sum constant after CPU-down:
- */
-static void migrate_nr_uninterruptible(struct rq *rq_src)
-{
-	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
-
-	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
-	rq_src->nr_uninterruptible = 0;
-}
-
-/*
- * remove the tasks which were accounted by rq from calc_load_tasks.
+ * Since this CPU is going 'away' for a while, fold any nr_active delta
+ * we might have. Assumes we're called after migrate_tasks() so that the
+ * nr_active count is stable.
+ *
+ * Also see the comment "Global load-average calculations".
  */
-static void calc_global_load_remove(struct rq *rq)
+static void calc_load_migrate(struct rq *rq)
 {
-	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
-	rq->calc_load_active = 0;
+	long delta = calc_load_fold_active(rq);
+	if (delta)
+		atomic_long_add(delta, &calc_load_tasks);
 }
 
 /*
@@ -5622,9 +5612,18 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		migrate_tasks(cpu);
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
+		break;
 
-		migrate_nr_uninterruptible(rq);
-		calc_global_load_remove(rq);
+	case CPU_DEAD:
+		{
+			struct rq *dest_rq;
+
+			local_irq_save(flags);
+			dest_rq = cpu_rq(smp_processor_id());
+			raw_spin_lock(&dest_rq->lock);
+			calc_load_migrate(rq);
+			raw_spin_unlock_irqrestore(&dest_rq->lock, flags);
+		}
 		break;
 #endif
 	}


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-27 18:44                 ` Paul E. McKenney
@ 2012-08-28  6:57                   ` Rakib Mullick
  2012-08-28 13:42                     ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Rakib Mullick @ 2012-08-28  6:57 UTC (permalink / raw)
  To: paulmck; +Cc: Peter Zijlstra, mingo, linux-kernel

Hello Paul,

On 8/28/12, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> On Mon, Aug 20, 2012 at 09:26:57AM -0700, Paul E. McKenney wrote:
>> On Mon, Aug 20, 2012 at 11:26:57AM +0200, Peter Zijlstra wrote:
>
> How about the following updated patch?
>
Actually, I was waiting for Peter's update.

> 							Thanx, Paul
>
> ------------------------------------------------------------------------
>
> sched: Fix load avg vs cpu-hotplug
>
> Rabik and Paul reported two different issues related to the same few
> lines of code.
>
> Rabik's issue is that the nr_uninterruptible migration code is wrong in
> that he sees artifacts due to this (Rabik please do expand in more
> detail).
>
> Paul's issue is that this code as it stands relies on us using
> stop_machine() for unplug, we all would like to remove this assumption
> so that eventually we can remove this stop_machine() usage altogether.
>
> The only reason we'd have to migrate nr_uninterruptible is so that we
> could use for_each_online_cpu() loops in favour of
> for_each_possible_cpu() loops, however since nr_uninterruptible() is the
> only such loop and its using possible lets not bother at all.
>
> The problem Rabik sees is (probably) caused by the fact that by
> migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
> involved.
>
> So don't bother with fancy migration schemes (meaning we now have to
> keep using for_each_possible_cpu()) and instead fold any nr_active delta
> after we migrate all tasks away to make sure we don't have any skewed
> nr_active accounting.
>
> [ paulmck: Move call to calc_load_migration to CPU_DEAD to avoid
> miscounting noted by Rakib. ]
>
> Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
> Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e841dfc..a8807f2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5309,27 +5309,17 @@ void idle_task_exit(void)
>  }
>
>  /*
> - * While a dead CPU has no uninterruptible tasks queued at this point,
> - * it might still have a nonzero ->nr_uninterruptible counter, because
> - * for performance reasons the counter is not stricly tracking tasks to
> - * their home CPUs. So we just add the counter to another CPU's counter,
> - * to keep the global sum constant after CPU-down:
> - */
> -static void migrate_nr_uninterruptible(struct rq *rq_src)
> -{
> -	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
> -
> -	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
> -	rq_src->nr_uninterruptible = 0;
> -}
> -
> -/*
> - * remove the tasks which were accounted by rq from calc_load_tasks.
> + * Since this CPU is going 'away' for a while, fold any nr_active delta
> + * we might have. Assumes we're called after migrate_tasks() so that the
> + * nr_active count is stable.
> + *
> + * Also see the comment "Global load-average calculations".
>   */
> -static void calc_global_load_remove(struct rq *rq)
> +static void calc_load_migrate(struct rq *rq)
>  {
> -	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
> -	rq->calc_load_active = 0;
> +	long delta = calc_load_fold_active(rq);
> +	if (delta)
> +		atomic_long_add(delta, &calc_load_tasks);
>  }
>
>  /*
> @@ -5622,9 +5612,18 @@ migration_call(struct notifier_block *nfb, unsigned
> long action, void *hcpu)
>  		migrate_tasks(cpu);
>  		BUG_ON(rq->nr_running != 1); /* the migration thread */
>  		raw_spin_unlock_irqrestore(&rq->lock, flags);
> +		break;
>
> -		migrate_nr_uninterruptible(rq);
> -		calc_global_load_remove(rq);
> +	case CPU_DEAD:
> +		{
> +			struct rq *dest_rq;
> +
> +			local_irq_save(flags);
> +			dest_rq = cpu_rq(smp_processor_id());

Use of smp_processor_id() as dest cpu isn't clear to me, this
processor is about to get down, isn't it?

> +			raw_spin_lock(&dest_rq->lock);
> +			calc_load_migrate(rq);

Well, calc_load_migrate() has no impact cause rq->nr_running == 1 at
this point. It's been already pointed out previously.

Thanks,
Rakib

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-28  6:57                   ` Rakib Mullick
@ 2012-08-28 13:42                     ` Paul E. McKenney
  2012-08-28 16:52                       ` Rakib Mullick
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2012-08-28 13:42 UTC (permalink / raw)
  To: Rakib Mullick; +Cc: Peter Zijlstra, mingo, linux-kernel

On Tue, Aug 28, 2012 at 12:57:09PM +0600, Rakib Mullick wrote:
> Hello Paul,
> 
> On 8/28/12, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > On Mon, Aug 20, 2012 at 09:26:57AM -0700, Paul E. McKenney wrote:
> >> On Mon, Aug 20, 2012 at 11:26:57AM +0200, Peter Zijlstra wrote:
> >
> > How about the following updated patch?
> >
> Actually, I was waiting for Peter's update.

I was too, but chatted with Peter.

> > 							Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > sched: Fix load avg vs cpu-hotplug
> >
> > Rabik and Paul reported two different issues related to the same few
> > lines of code.
> >
> > Rabik's issue is that the nr_uninterruptible migration code is wrong in
> > that he sees artifacts due to this (Rabik please do expand in more
> > detail).
> >
> > Paul's issue is that this code as it stands relies on us using
> > stop_machine() for unplug, we all would like to remove this assumption
> > so that eventually we can remove this stop_machine() usage altogether.
> >
> > The only reason we'd have to migrate nr_uninterruptible is so that we
> > could use for_each_online_cpu() loops in favour of
> > for_each_possible_cpu() loops, however since nr_uninterruptible() is the
> > only such loop and its using possible lets not bother at all.
> >
> > The problem Rabik sees is (probably) caused by the fact that by
> > migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
> > involved.
> >
> > So don't bother with fancy migration schemes (meaning we now have to
> > keep using for_each_possible_cpu()) and instead fold any nr_active delta
> > after we migrate all tasks away to make sure we don't have any skewed
> > nr_active accounting.
> >
> > [ paulmck: Move call to calc_load_migration to CPU_DEAD to avoid
> > miscounting noted by Rakib. ]
> >
> > Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
> > Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index e841dfc..a8807f2 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5309,27 +5309,17 @@ void idle_task_exit(void)
> >  }
> >
> >  /*
> > - * While a dead CPU has no uninterruptible tasks queued at this point,
> > - * it might still have a nonzero ->nr_uninterruptible counter, because
> > - * for performance reasons the counter is not stricly tracking tasks to
> > - * their home CPUs. So we just add the counter to another CPU's counter,
> > - * to keep the global sum constant after CPU-down:
> > - */
> > -static void migrate_nr_uninterruptible(struct rq *rq_src)
> > -{
> > -	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
> > -
> > -	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
> > -	rq_src->nr_uninterruptible = 0;
> > -}
> > -
> > -/*
> > - * remove the tasks which were accounted by rq from calc_load_tasks.
> > + * Since this CPU is going 'away' for a while, fold any nr_active delta
> > + * we might have. Assumes we're called after migrate_tasks() so that the
> > + * nr_active count is stable.
> > + *
> > + * Also see the comment "Global load-average calculations".
> >   */
> > -static void calc_global_load_remove(struct rq *rq)
> > +static void calc_load_migrate(struct rq *rq)
> >  {
> > -	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
> > -	rq->calc_load_active = 0;
> > +	long delta = calc_load_fold_active(rq);
> > +	if (delta)
> > +		atomic_long_add(delta, &calc_load_tasks);
> >  }
> >
> >  /*
> > @@ -5622,9 +5612,18 @@ migration_call(struct notifier_block *nfb, unsigned
> > long action, void *hcpu)
> >  		migrate_tasks(cpu);
> >  		BUG_ON(rq->nr_running != 1); /* the migration thread */
> >  		raw_spin_unlock_irqrestore(&rq->lock, flags);
> > +		break;
> >
> > -		migrate_nr_uninterruptible(rq);
> > -		calc_global_load_remove(rq);
> > +	case CPU_DEAD:
> > +		{
> > +			struct rq *dest_rq;
> > +
> > +			local_irq_save(flags);
> > +			dest_rq = cpu_rq(smp_processor_id());
> 
> Use of smp_processor_id() as dest cpu isn't clear to me, this
> processor is about to get down, isn't it?

Nope.  The CPU_DEAD notifier happens after the outgoing CPU has been
fully offlined, and so it must run on some other CPU.

> > +			raw_spin_lock(&dest_rq->lock);
> > +			calc_load_migrate(rq);
> 
> Well, calc_load_migrate() has no impact cause rq->nr_running == 1 at
> this point. It's been already pointed out previously.

Even after the outgoing CPU is fully gone?  I would hope that the value
would be zero.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-28 13:42                     ` Paul E. McKenney
@ 2012-08-28 16:52                       ` Rakib Mullick
  2012-08-28 17:07                         ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Rakib Mullick @ 2012-08-28 16:52 UTC (permalink / raw)
  To: paulmck; +Cc: Peter Zijlstra, mingo, linux-kernel

On 8/28/12, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Aug 28, 2012 at 12:57:09PM +0600, Rakib Mullick wrote:
>> Hello Paul,
>>
>> On 8/28/12, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>> > On Mon, Aug 20, 2012 at 09:26:57AM -0700, Paul E. McKenney wrote:
>> >> On Mon, Aug 20, 2012 at 11:26:57AM +0200, Peter Zijlstra wrote:
>> >
>> > How about the following updated patch?
>> >
>> Actually, I was waiting for Peter's update.
>
> I was too, but chatted with Peter.
>
>> > 							Thanx, Paul
>> >
>> > ------------------------------------------------------------------------
>> >
>> > sched: Fix load avg vs cpu-hotplug
>> >
>> > Rabik and Paul reported two different issues related to the same few
>> > lines of code.
>> >
>> > Rabik's issue is that the nr_uninterruptible migration code is wrong in
>> > that he sees artifacts due to this (Rabik please do expand in more
>> > detail).
>> >
>> > Paul's issue is that this code as it stands relies on us using
>> > stop_machine() for unplug, we all would like to remove this assumption
>> > so that eventually we can remove this stop_machine() usage altogether.
>> >
>> > The only reason we'd have to migrate nr_uninterruptible is so that we
>> > could use for_each_online_cpu() loops in favour of
>> > for_each_possible_cpu() loops, however since nr_uninterruptible() is
>> > the
>> > only such loop and its using possible lets not bother at all.
>> >
>> > The problem Rabik sees is (probably) caused by the fact that by
>> > migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
>> > involved.
>> >
>> > So don't bother with fancy migration schemes (meaning we now have to
>> > keep using for_each_possible_cpu()) and instead fold any nr_active
>> > delta
>> > after we migrate all tasks away to make sure we don't have any skewed
>> > nr_active accounting.
>> >
>> > [ paulmck: Move call to calc_load_migration to CPU_DEAD to avoid
>> > miscounting noted by Rakib. ]
>> >
>> > Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
>> > Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
>> >
>> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> > index e841dfc..a8807f2 100644
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -5309,27 +5309,17 @@ void idle_task_exit(void)
>> >  }
>> >
>> >  /*
>> > - * While a dead CPU has no uninterruptible tasks queued at this point,
>> > - * it might still have a nonzero ->nr_uninterruptible counter, because
>> > - * for performance reasons the counter is not stricly tracking tasks
>> > to
>> > - * their home CPUs. So we just add the counter to another CPU's
>> > counter,
>> > - * to keep the global sum constant after CPU-down:
>> > - */
>> > -static void migrate_nr_uninterruptible(struct rq *rq_src)
>> > -{
>> > -	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
>> > -
>> > -	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
>> > -	rq_src->nr_uninterruptible = 0;
>> > -}
>> > -
>> > -/*
>> > - * remove the tasks which were accounted by rq from calc_load_tasks.
>> > + * Since this CPU is going 'away' for a while, fold any nr_active
>> > delta
>> > + * we might have. Assumes we're called after migrate_tasks() so that
>> > the
>> > + * nr_active count is stable.
>> > + *
>> > + * Also see the comment "Global load-average calculations".
>> >   */
>> > -static void calc_global_load_remove(struct rq *rq)
>> > +static void calc_load_migrate(struct rq *rq)
>> >  {
>> > -	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
>> > -	rq->calc_load_active = 0;
>> > +	long delta = calc_load_fold_active(rq);
>> > +	if (delta)
>> > +		atomic_long_add(delta, &calc_load_tasks);
>> >  }
>> >
>> >  /*
>> > @@ -5622,9 +5612,18 @@ migration_call(struct notifier_block *nfb,
>> > unsigned
>> > long action, void *hcpu)
>> >  		migrate_tasks(cpu);
>> >  		BUG_ON(rq->nr_running != 1); /* the migration thread */
>> >  		raw_spin_unlock_irqrestore(&rq->lock, flags);
>> > +		break;
>> >
>> > -		migrate_nr_uninterruptible(rq);
>> > -		calc_global_load_remove(rq);
>> > +	case CPU_DEAD:
>> > +		{
>> > +			struct rq *dest_rq;
>> > +
>> > +			local_irq_save(flags);
>> > +			dest_rq = cpu_rq(smp_processor_id());
>>
>> Use of smp_processor_id() as dest cpu isn't clear to me, this
>> processor is about to get down, isn't it?
>
> Nope.  The CPU_DEAD notifier happens after the outgoing CPU has been
> fully offlined, and so it must run on some other CPU.
>
>> > +			raw_spin_lock(&dest_rq->lock);
>> > +			calc_load_migrate(rq);
>>
>> Well, calc_load_migrate() has no impact cause rq->nr_running == 1 at
>> this point. It's been already pointed out previously.
>
> Even after the outgoing CPU is fully gone?  I would hope that the value
> would be zero.
>
Perhaps, yes and it doesn't make any difference. And so, at this point
doing calc_load_migrate()... I'm not sure. But, I'm sure that, this is
not what I had in my mind.

The patch I sent it was to move rq->nr_uninterruptible to the dest_rq
where all the tasks were moved. Then, Peter says that patch isn't
correct, cause tasks might get spread out amongst more than one CPU
due to tasks affinity (*if* task was affined). But, we can easily
expect that, admin is smart enough to not put a CPU offline, where
s/he put a task to explicitly run on. I agree that, my patch isn't
100% correct but it's better than current which is almost 100% wrong
which picks up a random CPU-rq to move rq->nr_uninterruptible count.
So, simply moving ->nr_uninterruptible to dest_rq (where tasks were
moved) should be fine. And when dest_rq's time will come to go idle
i.e calc_load_enter_idle() or calc_load_account_active() (if no_hz=n),
active count will get folded, we do not need to explicitly fold
nr_active count when CPU is going to die.

Thanks,
Rakib.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-28 16:52                       ` Rakib Mullick
@ 2012-08-28 17:07                         ` Paul E. McKenney
  2012-08-29  1:05                           ` Rakib Mullick
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2012-08-28 17:07 UTC (permalink / raw)
  To: Rakib Mullick; +Cc: Peter Zijlstra, mingo, linux-kernel

On Tue, Aug 28, 2012 at 10:52:45PM +0600, Rakib Mullick wrote:
> On 8/28/12, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > On Tue, Aug 28, 2012 at 12:57:09PM +0600, Rakib Mullick wrote:
> >> Hello Paul,
> >>
> >> On 8/28/12, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Mon, Aug 20, 2012 at 09:26:57AM -0700, Paul E. McKenney wrote:
> >> >> On Mon, Aug 20, 2012 at 11:26:57AM +0200, Peter Zijlstra wrote:
> >> >
> >> > How about the following updated patch?
> >> >
> >> Actually, I was waiting for Peter's update.
> >
> > I was too, but chatted with Peter.
> >
> >> > 							Thanx, Paul
> >> >
> >> > ------------------------------------------------------------------------
> >> >
> >> > sched: Fix load avg vs cpu-hotplug
> >> >
> >> > Rabik and Paul reported two different issues related to the same few
> >> > lines of code.
> >> >
> >> > Rabik's issue is that the nr_uninterruptible migration code is wrong in
> >> > that he sees artifacts due to this (Rabik please do expand in more
> >> > detail).
> >> >
> >> > Paul's issue is that this code as it stands relies on us using
> >> > stop_machine() for unplug, we all would like to remove this assumption
> >> > so that eventually we can remove this stop_machine() usage altogether.
> >> >
> >> > The only reason we'd have to migrate nr_uninterruptible is so that we
> >> > could use for_each_online_cpu() loops in favour of
> >> > for_each_possible_cpu() loops, however since nr_uninterruptible() is
> >> > the
> >> > only such loop and its using possible lets not bother at all.
> >> >
> >> > The problem Rabik sees is (probably) caused by the fact that by
> >> > migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
> >> > involved.
> >> >
> >> > So don't bother with fancy migration schemes (meaning we now have to
> >> > keep using for_each_possible_cpu()) and instead fold any nr_active
> >> > delta
> >> > after we migrate all tasks away to make sure we don't have any skewed
> >> > nr_active accounting.
> >> >
> >> > [ paulmck: Move call to calc_load_migration to CPU_DEAD to avoid
> >> > miscounting noted by Rakib. ]
> >> >
> >> > Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
> >> > Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> >> >
> >> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> > index e841dfc..a8807f2 100644
> >> > --- a/kernel/sched/core.c
> >> > +++ b/kernel/sched/core.c
> >> > @@ -5309,27 +5309,17 @@ void idle_task_exit(void)
> >> >  }
> >> >
> >> >  /*
> >> > - * While a dead CPU has no uninterruptible tasks queued at this point,
> >> > - * it might still have a nonzero ->nr_uninterruptible counter, because
> >> > - * for performance reasons the counter is not stricly tracking tasks
> >> > to
> >> > - * their home CPUs. So we just add the counter to another CPU's
> >> > counter,
> >> > - * to keep the global sum constant after CPU-down:
> >> > - */
> >> > -static void migrate_nr_uninterruptible(struct rq *rq_src)
> >> > -{
> >> > -	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
> >> > -
> >> > -	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
> >> > -	rq_src->nr_uninterruptible = 0;
> >> > -}
> >> > -
> >> > -/*
> >> > - * remove the tasks which were accounted by rq from calc_load_tasks.
> >> > + * Since this CPU is going 'away' for a while, fold any nr_active
> >> > delta
> >> > + * we might have. Assumes we're called after migrate_tasks() so that
> >> > the
> >> > + * nr_active count is stable.
> >> > + *
> >> > + * Also see the comment "Global load-average calculations".
> >> >   */
> >> > -static void calc_global_load_remove(struct rq *rq)
> >> > +static void calc_load_migrate(struct rq *rq)
> >> >  {
> >> > -	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
> >> > -	rq->calc_load_active = 0;
> >> > +	long delta = calc_load_fold_active(rq);
> >> > +	if (delta)
> >> > +		atomic_long_add(delta, &calc_load_tasks);
> >> >  }
> >> >
> >> >  /*
> >> > @@ -5622,9 +5612,18 @@ migration_call(struct notifier_block *nfb,
> >> > unsigned
> >> > long action, void *hcpu)
> >> >  		migrate_tasks(cpu);
> >> >  		BUG_ON(rq->nr_running != 1); /* the migration thread */
> >> >  		raw_spin_unlock_irqrestore(&rq->lock, flags);
> >> > +		break;
> >> >
> >> > -		migrate_nr_uninterruptible(rq);
> >> > -		calc_global_load_remove(rq);
> >> > +	case CPU_DEAD:
> >> > +		{
> >> > +			struct rq *dest_rq;
> >> > +
> >> > +			local_irq_save(flags);
> >> > +			dest_rq = cpu_rq(smp_processor_id());
> >>
> >> Use of smp_processor_id() as dest cpu isn't clear to me, this
> >> processor is about to get down, isn't it?
> >
> > Nope.  The CPU_DEAD notifier happens after the outgoing CPU has been
> > fully offlined, and so it must run on some other CPU.
> >
> >> > +			raw_spin_lock(&dest_rq->lock);
> >> > +			calc_load_migrate(rq);
> >>
> >> Well, calc_load_migrate() has no impact cause rq->nr_running == 1 at
> >> this point. It's been already pointed out previously.
> >
> > Even after the outgoing CPU is fully gone?  I would hope that the value
> > would be zero.
> >
> Perhaps, yes and it doesn't make any difference. And so, at this point
> doing calc_load_migrate()... I'm not sure. But, I'm sure that, this is
> not what I had in my mind.
> 
> The patch I sent it was to move rq->nr_uninterruptible to the dest_rq
> where all the tasks were moved. Then, Peter says that patch isn't
> correct, cause tasks might get spread out amongst more than one CPU
> due to tasks affinity (*if* task was affined). But, we can easily
> expect that, admin is smart enough to not put a CPU offline, where
> s/he put a task to explicitly run on. I agree that, my patch isn't
> 100% correct but it's better than current which is almost 100% wrong
> which picks up a random CPU-rq to move rq->nr_uninterruptible count.
> So, simply moving ->nr_uninterruptible to dest_rq (where tasks were
> moved) should be fine. And when dest_rq's time will come to go idle
> i.e calc_load_enter_idle() or calc_load_account_active() (if no_hz=n),
> active count will get folded, we do not need to explicitly fold
> nr_active count when CPU is going to die.

OK, but I thought that Peter said that ->nr_uninterruptible was
meaningful only when summed across all CPUs.  If that is the case,
it shouldn't matter where the counts are moved.

(I am not all that worried about the exact form of the patch, as long
as it allows us to get rid of the __stop_machine() in CPU offlining.)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down.
  2012-08-28 17:07                         ` Paul E. McKenney
@ 2012-08-29  1:05                           ` Rakib Mullick
  0 siblings, 0 replies; 26+ messages in thread
From: Rakib Mullick @ 2012-08-29  1:05 UTC (permalink / raw)
  To: paulmck; +Cc: Peter Zijlstra, mingo, linux-kernel

On 8/28/12, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>
> OK, but I thought that Peter said that ->nr_uninterruptible was
> meaningful only when summed across all CPUs.  If that is the case,
> it shouldn't matter where the counts are moved.
>
Yes, right. But, nr_uninterruptible is also use to calculate delta.
Please see calc_load_fold_active().

Thanks,
Rakib.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip:sched/core] sched: Fix load avg vs cpu-hotplug
  2012-08-20  9:26             ` Peter Zijlstra
  2012-08-20 16:10               ` Rakib Mullick
  2012-08-20 16:26               ` Paul E. McKenney
@ 2012-09-04 18:43               ` tip-bot for Peter Zijlstra
  2012-09-05 12:36                 ` Peter Zijlstra
  2 siblings, 1 reply; 26+ messages in thread
From: tip-bot for Peter Zijlstra @ 2012-09-04 18:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, rakib.mullick, a.p.zijlstra, peterz,
	paulmck, tglx

Commit-ID:  f319da0c6894fcf55e21320e40506418a2aad629
Gitweb:     http://git.kernel.org/tip/f319da0c6894fcf55e21320e40506418a2aad629
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 20 Aug 2012 11:26:57 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 4 Sep 2012 14:30:18 +0200

sched: Fix load avg vs cpu-hotplug

Rabik and Paul reported two different issues related to the same few
lines of code.

Rabik's issue is that the nr_uninterruptible migration code is wrong in
that he sees artifacts due to this (Rabik please do expand in more
detail).

Paul's issue is that this code as it stands relies on us using
stop_machine() for unplug, we all would like to remove this assumption
so that eventually we can remove this stop_machine() usage altogether.

The only reason we'd have to migrate nr_uninterruptible is so that we
could use for_each_online_cpu() loops in favour of
for_each_possible_cpu() loops, however since nr_uninterruptible() is the
only such loop and its using possible lets not bother at all.

The problem Rabik sees is (probably) caused by the fact that by
migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
involved.

So don't bother with fancy migration schemes (meaning we now have to
keep using for_each_possible_cpu()) and instead fold any nr_active delta
after we migrate all tasks away to make sure we don't have any skewed
nr_active accounting.

Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1345454817.23018.27.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c |   31 ++++++++++---------------------
 1 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fbf1fd0..207a81c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5304,27 +5304,17 @@ void idle_task_exit(void)
 }
 
 /*
- * While a dead CPU has no uninterruptible tasks queued at this point,
- * it might still have a nonzero ->nr_uninterruptible counter, because
- * for performance reasons the counter is not stricly tracking tasks to
- * their home CPUs. So we just add the counter to another CPU's counter,
- * to keep the global sum constant after CPU-down:
- */
-static void migrate_nr_uninterruptible(struct rq *rq_src)
-{
-	struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
-
-	rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
-	rq_src->nr_uninterruptible = 0;
-}
-
-/*
- * remove the tasks which were accounted by rq from calc_load_tasks.
+ * Since this CPU is going 'away' for a while, fold any nr_active delta
+ * we might have. Assumes we're called after migrate_tasks() so that the
+ * nr_active count is stable.
+ *
+ * Also see the comment "Global load-average calculations".
  */
-static void calc_global_load_remove(struct rq *rq)
+static void calc_load_migrate(struct rq *rq)
 {
-	atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
-	rq->calc_load_active = 0;
+	long delta = calc_load_fold_active(rq);
+	if (delta)
+		atomic_long_add(delta, &calc_load_tasks);
 }
 
 /*
@@ -5618,8 +5608,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
 
-		migrate_nr_uninterruptible(rq);
-		calc_global_load_remove(rq);
+		calc_load_migrate(rq);
 		break;
 #endif
 	}

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [tip:sched/core] sched: Fix load avg vs cpu-hotplug
  2012-09-04 18:43               ` [tip:sched/core] sched: Fix load avg vs cpu-hotplug tip-bot for Peter Zijlstra
@ 2012-09-05 12:36                 ` Peter Zijlstra
  2012-09-05 13:29                   ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2012-09-05 12:36 UTC (permalink / raw)
  To: paulmck, mingo, hpa, linux-kernel, rakib.mullick, tglx; +Cc: linux-tip-commits

On Tue, 2012-09-04 at 11:43 -0700, tip-bot for Peter Zijlstra wrote:
> Commit-ID:  f319da0c6894fcf55e21320e40506418a2aad629
> Gitweb:     http://git.kernel.org/tip/f319da0c6894fcf55e21320e40506418a2aad629
> Author:     Peter Zijlstra <peterz@infradead.org>
> AuthorDate: Mon, 20 Aug 2012 11:26:57 +0200
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 4 Sep 2012 14:30:18 +0200
> 
> sched: Fix load avg vs cpu-hotplug
> 
> Rabik and Paul reported two different issues related to the same few
> lines of code.
> 
> Rabik's issue is that the nr_uninterruptible migration code is wrong in
> that he sees artifacts due to this (Rabik please do expand in more
> detail).
> 
> Paul's issue is that this code as it stands relies on us using
> stop_machine() for unplug, we all would like to remove this assumption
> so that eventually we can remove this stop_machine() usage altogether.
> 
> The only reason we'd have to migrate nr_uninterruptible is so that we
> could use for_each_online_cpu() loops in favour of
> for_each_possible_cpu() loops, however since nr_uninterruptible() is the
> only such loop and its using possible lets not bother at all.
> 
> The problem Rabik sees is (probably) caused by the fact that by
> migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
> involved.
> 
> So don't bother with fancy migration schemes (meaning we now have to
> keep using for_each_possible_cpu()) and instead fold any nr_active delta
> after we migrate all tasks away to make sure we don't have any skewed
> nr_active accounting.

Oh argh.. this patch isn't actually right.. I actually removed it from
my series but forgot to update the tarball.

Ingo can you still make it go away or should I do a delta?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [tip:sched/core] sched: Fix load avg vs cpu-hotplug
  2012-09-05 12:36                 ` Peter Zijlstra
@ 2012-09-05 13:29                   ` Ingo Molnar
  2012-09-05 17:01                     ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2012-09-05 13:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulmck, hpa, linux-kernel, rakib.mullick, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, 2012-09-04 at 11:43 -0700, tip-bot for Peter Zijlstra wrote:
> > Commit-ID:  f319da0c6894fcf55e21320e40506418a2aad629
> > Gitweb:     http://git.kernel.org/tip/f319da0c6894fcf55e21320e40506418a2aad629
> > Author:     Peter Zijlstra <peterz@infradead.org>
> > AuthorDate: Mon, 20 Aug 2012 11:26:57 +0200
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Tue, 4 Sep 2012 14:30:18 +0200
> > 
> > sched: Fix load avg vs cpu-hotplug
> > 
> > Rabik and Paul reported two different issues related to the same few
> > lines of code.
> > 
> > Rabik's issue is that the nr_uninterruptible migration code is wrong in
> > that he sees artifacts due to this (Rabik please do expand in more
> > detail).
> > 
> > Paul's issue is that this code as it stands relies on us using
> > stop_machine() for unplug, we all would like to remove this assumption
> > so that eventually we can remove this stop_machine() usage altogether.
> > 
> > The only reason we'd have to migrate nr_uninterruptible is so that we
> > could use for_each_online_cpu() loops in favour of
> > for_each_possible_cpu() loops, however since nr_uninterruptible() is the
> > only such loop and its using possible lets not bother at all.
> > 
> > The problem Rabik sees is (probably) caused by the fact that by
> > migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
> > involved.
> > 
> > So don't bother with fancy migration schemes (meaning we now have to
> > keep using for_each_possible_cpu()) and instead fold any nr_active delta
> > after we migrate all tasks away to make sure we don't have any skewed
> > nr_active accounting.
> 
> Oh argh.. this patch isn't actually right.. I actually removed 
> it from my series but forgot to update the tarball.

Sigh.

> Ingo can you still make it go away or should I do a delta?

Please do a delta.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [tip:sched/core] sched: Fix load avg vs cpu-hotplug
  2012-09-05 13:29                   ` Ingo Molnar
@ 2012-09-05 17:01                     ` Peter Zijlstra
  2012-09-05 17:34                       ` Ingo Molnar
  2012-09-05 22:03                       ` Peter Zijlstra
  0 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2012-09-05 17:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: paulmck, hpa, linux-kernel, rakib.mullick, tglx, linux-tip-commits

On Wed, 2012-09-05 at 15:29 +0200, Ingo Molnar wrote:
> > Oh argh.. this patch isn't actually right.. I actually removed 
> > it from my series but forgot to update the tarball.
> 
> Sigh.

Yeah, sorry about that, jet-lag makes me do stupid at a higher rate than
usual :/

> > Ingo can you still make it go away or should I do a delta?
> 
> Please do a delta. 

Ok, will do.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [tip:sched/core] sched: Fix load avg vs cpu-hotplug
  2012-09-05 17:01                     ` Peter Zijlstra
@ 2012-09-05 17:34                       ` Ingo Molnar
  2012-09-05 22:03                       ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2012-09-05 17:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulmck, hpa, linux-kernel, rakib.mullick, tglx, linux-tip-commits


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2012-09-05 at 15:29 +0200, Ingo Molnar wrote:
> > > Oh argh.. this patch isn't actually right.. I actually removed 
> > > it from my series but forgot to update the tarball.
> > 
> > Sigh.
> 
> Yeah, sorry about that, jet-lag makes me do stupid at a higher 
> rate than usual :/

No problem! :-)

> > > Ingo can you still make it go away or should I do a delta?
> > 
> > Please do a delta.
> 
> Ok, will do.

Thanks!

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [tip:sched/core] sched: Fix load avg vs cpu-hotplug
  2012-09-05 17:01                     ` Peter Zijlstra
  2012-09-05 17:34                       ` Ingo Molnar
@ 2012-09-05 22:03                       ` Peter Zijlstra
  2012-09-05 23:39                         ` Paul E. McKenney
                                           ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Peter Zijlstra @ 2012-09-05 22:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: paulmck, hpa, linux-kernel, rakib.mullick, tglx, linux-tip-commits

On Wed, 2012-09-05 at 19:01 +0200, Peter Zijlstra wrote:
> > Please do a delta.

OK, so I suppose something like the below ought to do. Paul its slightly
different than the one in your tree, given the changelog below, do you
see anything wrong with it?

Rakib, again, sorry for getting your name wrong, and this time for
getting it merged :/

---
Subject: sched: Fix load avg vs cpu-hotplug mk-II

Commit f319da0c68 ("sched: Fix load avg vs cpu-hotplug") was a known
broken version that got in by accident.

In particular, the problem is that at the point it calls
calc_load_migrate() nr_running := 1 (the stopper thread), so move the
call to CPU_DEAD where we're sure that nr_running := 0.

Also note that we can call calc_load_migrate() without serialization, we
know the state of rq is stable since its cpu is dead, and we modify the
global state using appropriate atomic ops. 

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c46a011..8c089cb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5086,7 +5086,9 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		migrate_tasks(cpu);
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
+		break;

+	case CPU_DEAD:
 		calc_load_migrate(rq);
 		break;
 #endif

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [tip:sched/core] sched: Fix load avg vs cpu-hotplug
  2012-09-05 22:03                       ` Peter Zijlstra
@ 2012-09-05 23:39                         ` Paul E. McKenney
  2012-09-06  3:30                         ` Rakib Mullick
  2012-09-14  6:14                         ` [tip:sched/core] sched: Fix load avg vs. cpu-hotplug tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 26+ messages in thread
From: Paul E. McKenney @ 2012-09-05 23:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, hpa, linux-kernel, rakib.mullick, tglx, linux-tip-commits

On Thu, Sep 06, 2012 at 12:03:50AM +0200, Peter Zijlstra wrote:
> On Wed, 2012-09-05 at 19:01 +0200, Peter Zijlstra wrote:
> > > Please do a delta.
> 
> OK, so I suppose something like the below ought to do. Paul its slightly
> different than the one in your tree, given the changelog below, do you
> see anything wrong with it?
> 
> Rakib, again, sorry for getting your name wrong, and this time for
> getting it merged :/
> 
> ---
> Subject: sched: Fix load avg vs cpu-hotplug mk-II
> 
> Commit f319da0c68 ("sched: Fix load avg vs cpu-hotplug") was a known
> broken version that got in by accident.
> 
> In particular, the problem is that at the point it calls
> calc_load_migrate() nr_running := 1 (the stopper thread), so move the
> call to CPU_DEAD where we're sure that nr_running := 0.
> 
> Also note that we can call calc_load_migrate() without serialization, we
> know the state of rq is stable since its cpu is dead, and we modify the
> global state using appropriate atomic ops. 
> 
> Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

Given your point about atomic ops, my version was indeed overkill.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  kernel/sched/core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c46a011..8c089cb 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5086,7 +5086,9 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  		migrate_tasks(cpu);
>  		BUG_ON(rq->nr_running != 1); /* the migration thread */
>  		raw_spin_unlock_irqrestore(&rq->lock, flags);
> +		break;
>  
> +	case CPU_DEAD:
>  		calc_load_migrate(rq);
>  		break;
>  #endif
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [tip:sched/core] sched: Fix load avg vs cpu-hotplug
  2012-09-05 22:03                       ` Peter Zijlstra
  2012-09-05 23:39                         ` Paul E. McKenney
@ 2012-09-06  3:30                         ` Rakib Mullick
  2012-09-14  6:14                         ` [tip:sched/core] sched: Fix load avg vs. cpu-hotplug tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 26+ messages in thread
From: Rakib Mullick @ 2012-09-06  3:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, paulmck, hpa, linux-kernel, tglx, linux-tip-commits

On 9/6/12, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2012-09-05 at 19:01 +0200, Peter Zijlstra wrote:
>> > Please do a delta.
>
> OK, so I suppose something like the below ought to do. Paul its slightly
> different than the one in your tree, given the changelog below, do you
> see anything wrong with it?
>
> Rakib, again, sorry for getting your name wrong, and this time for
> getting it merged :/
>
It's okay, no problem. I was just pointed out what was the mistakes. I
didn't take too much seriously. ( Actually, my friends often called me
in such names that are no way near of "Rakib" or "Rabik", those names
sounds worse than "Rabik" ;-). So, I had to cope with it :-).)

Thanks,
Rakib.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip:sched/core] sched: Fix load avg vs. cpu-hotplug
  2012-09-05 22:03                       ` Peter Zijlstra
  2012-09-05 23:39                         ` Paul E. McKenney
  2012-09-06  3:30                         ` Rakib Mullick
@ 2012-09-14  6:14                         ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 26+ messages in thread
From: tip-bot for Peter Zijlstra @ 2012-09-14  6:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulmck, hpa, mingo, a.p.zijlstra, peterz, tglx

Commit-ID:  08bedae1d0acd8c9baf514fb69fa199d0c8345f6
Gitweb:     http://git.kernel.org/tip/08bedae1d0acd8c9baf514fb69fa199d0c8345f6
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 6 Sep 2012 00:03:50 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 13 Sep 2012 16:52:05 +0200

sched: Fix load avg vs. cpu-hotplug

Commit f319da0c68 ("sched: Fix load avg vs cpu-hotplug") was an
incomplete fix:

In particular, the problem is that at the point it calls
calc_load_migrate() nr_running := 1 (the stopper thread), so move the
call to CPU_DEAD where we're sure that nr_running := 0.

Also note that we can call calc_load_migrate() without serialization, we
know the state of rq is stable since its cpu is dead, and we modify the
global state using appropriate atomic ops.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1346882630.2600.59.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b51b2d..ba144b1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5048,7 +5048,9 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		migrate_tasks(cpu);
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
+		break;
 
+	case CPU_DEAD:
 		calc_load_migrate(rq);
 		break;
 #endif

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-09-14  6:14 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-16 13:45 Add rq->nr_uninterruptible count to dest cpu's rq while CPU goes down Rakib Mullick
2012-08-16 13:56 ` Peter Zijlstra
2012-08-16 14:28   ` Rakib Mullick
2012-08-16 14:42     ` Peter Zijlstra
2012-08-16 15:32       ` Rakib Mullick
2012-08-16 17:46         ` Peter Zijlstra
2012-08-17 13:39           ` Rakib Mullick
2012-08-20  9:26             ` Peter Zijlstra
2012-08-20 16:10               ` Rakib Mullick
2012-08-20 16:16                 ` Peter Zijlstra
2012-08-20 16:26               ` Paul E. McKenney
2012-08-27 18:44                 ` Paul E. McKenney
2012-08-28  6:57                   ` Rakib Mullick
2012-08-28 13:42                     ` Paul E. McKenney
2012-08-28 16:52                       ` Rakib Mullick
2012-08-28 17:07                         ` Paul E. McKenney
2012-08-29  1:05                           ` Rakib Mullick
2012-09-04 18:43               ` [tip:sched/core] sched: Fix load avg vs cpu-hotplug tip-bot for Peter Zijlstra
2012-09-05 12:36                 ` Peter Zijlstra
2012-09-05 13:29                   ` Ingo Molnar
2012-09-05 17:01                     ` Peter Zijlstra
2012-09-05 17:34                       ` Ingo Molnar
2012-09-05 22:03                       ` Peter Zijlstra
2012-09-05 23:39                         ` Paul E. McKenney
2012-09-06  3:30                         ` Rakib Mullick
2012-09-14  6:14                         ` [tip:sched/core] sched: Fix load avg vs. cpu-hotplug tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).