All of lore.kernel.org
 help / color / mirror / Atom feed
* possible migration bug with hotplug cpu
@ 2009-07-08 15:48 Lucas De Marchi
  2009-07-08 15:55 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-08 15:48 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

I was doing some analysis with the number of migrations in my application and
I think there's a bug in this accounting or even worse, in the migrations
mechanism when used together with cpu hotplug.

I turned off all CPUs except one using the hotplug mechanism, after what I
launghed my application that has 8 threads. Before they finish they print the
file /proc/<tid>/sched. I have only 1 online CPU and there are ~ 200
migrations per thread. The function set_task_cpu is responsible for updating
the migrations counter and is called by 9 other functions. With some tests I
discovered that 95% of these migrations come from try_to_wake_up and the other
5% from pull_task and __migrate_task.

Looking at try_to_wake_up:

...
	cpu = task_cpu(p);
	orig_cpu = cpu;
	this_cpu = smp_processor_id();

#ifdef CONFIG_SMP
	if (unlikely(task_running(rq, p)))
		goto out_activate;

	cpu = p->sched_class->select_task_rq(p, sync);  //<<<<===
	if (cpu != orig_cpu) {                          //<<<<===
		set_task_cpu(p, cpu);
...

p->sched_class->select_task_rq(p, sync)  is returning a different cpu of
task_cpu(p) even if I have only 1 online CPU. In my tests this behavior is
similar for rt and normal tasks. For RT, the only possible problem could be on
find_lowest_rq, but I'm still rying to find out why. Since you have more
experience with this code, if you could give it a look I'd appreciate.

Is there any obscure reason why this behavior could be right?

Lucas De Marchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-08 15:48 possible migration bug with hotplug cpu Lucas De Marchi
@ 2009-07-08 15:55 ` Peter Zijlstra
  2009-07-08 16:05   ` Lucas De Marchi
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-07-08 15:55 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Ingo Molnar, linux-kernel

On Wed, 2009-07-08 at 17:48 +0200, Lucas De Marchi wrote:
> I was doing some analysis with the number of migrations in my application and
> I think there's a bug in this accounting or even worse, in the migrations
> mechanism when used together with cpu hotplug.
> 
> I turned off all CPUs except one using the hotplug mechanism, after what I
> launghed my application that has 8 threads. Before they finish they print the
> file /proc/<tid>/sched. I have only 1 online CPU and there are ~ 200
> migrations per thread. The function set_task_cpu is responsible for updating
> the migrations counter and is called by 9 other functions. With some tests I
> discovered that 95% of these migrations come from try_to_wake_up and the other
> 5% from pull_task and __migrate_task.
> 
> Looking at try_to_wake_up:
> 
> ....
> 	cpu = task_cpu(p);
> 	orig_cpu = cpu;
> 	this_cpu = smp_processor_id();
> 
> #ifdef CONFIG_SMP
> 	if (unlikely(task_running(rq, p)))
> 		goto out_activate;
> 
> 	cpu = p->sched_class->select_task_rq(p, sync);  //<<<<===
> 	if (cpu != orig_cpu) {                          //<<<<===
> 		set_task_cpu(p, cpu);
> ....
> 
> p->sched_class->select_task_rq(p, sync)  is returning a different cpu of
> task_cpu(p) even if I have only 1 online CPU. In my tests this behavior is
> similar for rt and normal tasks. For RT, the only possible problem could be on
> find_lowest_rq, but I'm still rying to find out why. Since you have more
> experience with this code, if you could give it a look I'd appreciate.
> 
> Is there any obscure reason why this behavior could be right?

If the task last ran on a now unplugged cpu this would be correct, is
this indeed what happens?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-08 15:55 ` Peter Zijlstra
@ 2009-07-08 16:05   ` Lucas De Marchi
  2009-07-08 16:39     ` Lucas De Marchi
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-08 16:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

No, because the tasks are executed only after the CPUs become offline.

Lucas De Marchi


On Wed, Jul 8, 2009 at 17:55, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, 2009-07-08 at 17:48 +0200, Lucas De Marchi wrote:
> > I was doing some analysis with the number of migrations in my application and
> > I think there's a bug in this accounting or even worse, in the migrations
> > mechanism when used together with cpu hotplug.
> >
> > I turned off all CPUs except one using the hotplug mechanism, after what I
> > launghed my application that has 8 threads. Before they finish they print the
> > file /proc/<tid>/sched. I have only 1 online CPU and there are ~ 200
> > migrations per thread. The function set_task_cpu is responsible for updating
> > the migrations counter and is called by 9 other functions. With some tests I
> > discovered that 95% of these migrations come from try_to_wake_up and the other
> > 5% from pull_task and __migrate_task.
> >
> > Looking at try_to_wake_up:
> >
> > ....
> >       cpu = task_cpu(p);
> >       orig_cpu = cpu;
> >       this_cpu = smp_processor_id();
> >
> > #ifdef CONFIG_SMP
> >       if (unlikely(task_running(rq, p)))
> >               goto out_activate;
> >
> >       cpu = p->sched_class->select_task_rq(p, sync);  //<<<<===
> >       if (cpu != orig_cpu) {                          //<<<<===
> >               set_task_cpu(p, cpu);
> > ....
> >
> > p->sched_class->select_task_rq(p, sync)  is returning a different cpu of
> > task_cpu(p) even if I have only 1 online CPU. In my tests this behavior is
> > similar for rt and normal tasks. For RT, the only possible problem could be on
> > find_lowest_rq, but I'm still rying to find out why. Since you have more
> > experience with this code, if you could give it a look I'd appreciate.
> >
> > Is there any obscure reason why this behavior could be right?
>
> If the task last ran on a now unplugged cpu this would be correct, is
> this indeed what happens?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-08 16:05   ` Lucas De Marchi
@ 2009-07-08 16:39     ` Lucas De Marchi
  2009-07-09 11:57       ` Lucas De Marchi
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-08 16:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

Following a piece of /proc/<pid>/sched, for RT tasks, running with only one
processor online:

se.wait_count                      :                 1466
sched_info.bkl_count               :                    0
se.nr_migrations                   :                  289  <<=========
se.nr_migrations_cold              :                    0
se.nr_failed_migrations_affine     :                    0
se.nr_failed_migrations_running    :                    7
se.nr_failed_migrations_hot        :                    3
se.nr_forced_migrations            :                    1
se.nr_forced2_migrations           :                   86
se.nr_wakeups                      :               151347
se.nr_wakeups_sync                 :                  298
se.nr_wakeups_migrate              :                  265
se.nr_wakeups_local                :               150516
se.nr_wakeups_remote               :                  831
se.nr_wakeups_affine               :                  253
se.nr_wakeups_affine_attempts      :                 1092
se.nr_wakeups_passive              :                    8
se.nr_wakeups_idle                 :                    0
avg_atom                           :             0.002887
avg_per_cpu                        :             1.498609
nr_switches                        :               150001
nr_voluntary_switches              :               150001
nr_involuntary_switches            :                    0
se.load.weight                     :               177522
policy                             :                    1
prio                               :                   89  <<=========
clock-delta                        :                   84


At  http://pastebin.com/pastebin.php?dl=m7c226875  there's the
/proc/sched_debug before and after running the test.


Lucas De Marchi


On Wed, Jul 8, 2009 at 18:05, Lucas De Marchi <lucas.de.marchi@gmail.com> wrote:
>
> No, because the tasks are executed only after the CPUs become offline.
>
> Lucas De Marchi
>
>
> On Wed, Jul 8, 2009 at 17:55, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, 2009-07-08 at 17:48 +0200, Lucas De Marchi wrote:
> > > I was doing some analysis with the number of migrations in my application and
> > > I think there's a bug in this accounting or even worse, in the migrations
> > > mechanism when used together with cpu hotplug.
> > >
> > > I turned off all CPUs except one using the hotplug mechanism, after what I
> > > launghed my application that has 8 threads. Before they finish they print the
> > > file /proc/<tid>/sched. I have only 1 online CPU and there are ~ 200
> > > migrations per thread. The function set_task_cpu is responsible for updating
> > > the migrations counter and is called by 9 other functions. With some tests I
> > > discovered that 95% of these migrations come from try_to_wake_up and the other
> > > 5% from pull_task and __migrate_task.
> > >
> > > Looking at try_to_wake_up:
> > >
> > > ....
> > >       cpu = task_cpu(p);
> > >       orig_cpu = cpu;
> > >       this_cpu = smp_processor_id();
> > >
> > > #ifdef CONFIG_SMP
> > >       if (unlikely(task_running(rq, p)))
> > >               goto out_activate;
> > >
> > >       cpu = p->sched_class->select_task_rq(p, sync);  //<<<<===
> > >       if (cpu != orig_cpu) {                          //<<<<===
> > >               set_task_cpu(p, cpu);
> > > ....
> > >
> > > p->sched_class->select_task_rq(p, sync)  is returning a different cpu of
> > > task_cpu(p) even if I have only 1 online CPU. In my tests this behavior is
> > > similar for rt and normal tasks. For RT, the only possible problem could be on
> > > find_lowest_rq, but I'm still rying to find out why. Since you have more
> > > experience with this code, if you could give it a look I'd appreciate.
> > >
> > > Is there any obscure reason why this behavior could be right?
> >
> > If the task last ran on a now unplugged cpu this would be correct, is
> > this indeed what happens?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-08 16:39     ` Lucas De Marchi
@ 2009-07-09 11:57       ` Lucas De Marchi
  2009-07-09 12:24         ` Peter Zijlstra
  2009-07-10 10:41         ` [tip:sched/urgent] sched: Reset sched stats on fork() tip-bot for Lucas De Marchi
  0 siblings, 2 replies; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-09 11:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

So, I found the problem.

These fields are currently not initialized upon fork. I noted that when I
updated to 2.6.31-rc2; commit 6c594c21fcb02c662f11c97be4d7d2b73060a205 was
merged into kernel by Ingo (not present yet in 2.6.30), but it only initializes
nr_migrations. Why the other fields are not initialized to 0? Even when there
are more processors, these fields may be wrong if not zeroed when a new task
is started. Below the fast way to fix it. This fixed the counters for me.

What do you think of creating a struct sched_statistics embedded into
sched_entity so we coudl memset it to zero all at once? All fields of
SCHED_STATS piece should be initialized, right?

Lucas De Marchi


diff --git a/kernel/sched.c b/kernel/sched.c
index fd3ac58..b8d75cf 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2572,15 +2572,37 @@ static void __sched_fork(struct task_struct *p)
 	p->se.avg_wakeup		= sysctl_sched_wakeup_granularity;

 #ifdef CONFIG_SCHEDSTATS
-	p->se.wait_start		= 0;
-	p->se.sum_sleep_runtime		= 0;
-	p->se.sleep_start		= 0;
-	p->se.block_start		= 0;
-	p->se.sleep_max			= 0;
-	p->se.block_max			= 0;
-	p->se.exec_max			= 0;
-	p->se.slice_max			= 0;
-	p->se.wait_max			= 0;
+	p->se.wait_start			= 0;
+	p->se.wait_max				= 0;
+	p->se.wait_count			= 0;
+	p->se.wait_sum				= 0;
+
+	p->se.sleep_start			= 0;
+	p->se.sleep_max				= 0;
+	p->se.sum_sleep_runtime			= 0;
+
+	p->se.block_start			= 0;
+	p->se.block_max				= 0;
+	p->se.exec_max				= 0;
+	p->se.slice_max				= 0;
+
+	p->se.nr_migrations_cold		= 0;
+	p->se.nr_failed_migrations_affine	= 0;
+	p->se.nr_failed_migrations_running	= 0;
+	p->se.nr_failed_migrations_hot		= 0;
+	p->se.nr_forced_migrations		= 0;
+	p->se.nr_forced2_migrations		= 0;
+
+	p->se.nr_wakeups			= 0;
+	p->se.nr_wakeups_sync			= 0;
+	p->se.nr_wakeups_migrate		= 0;
+	p->se.nr_wakeups_local			= 0;
+	p->se.nr_wakeups_remote			= 0;
+	p->se.nr_wakeups_affine			= 0;
+	p->se.nr_wakeups_affine_attempts	= 0;
+	p->se.nr_wakeups_passive		= 0;
+	p->se.nr_wakeups_idle			= 0;
+
 #endif

 	INIT_LIST_HEAD(&p->rt.run_list);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-09 11:57       ` Lucas De Marchi
@ 2009-07-09 12:24         ` Peter Zijlstra
  2009-07-09 12:55           ` Lucas De Marchi
  2009-07-10 10:41         ` [tip:sched/urgent] sched: Reset sched stats on fork() tip-bot for Lucas De Marchi
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2009-07-09 12:24 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Ingo Molnar, linux-kernel

On Thu, 2009-07-09 at 13:57 +0200, Lucas De Marchi wrote:
> So, I found the problem.
> 
> These fields are currently not initialized upon fork. I noted that when I
> updated to 2.6.31-rc2; commit 6c594c21fcb02c662f11c97be4d7d2b73060a205 was
> merged into kernel by Ingo (not present yet in 2.6.30), but it only initializes
> nr_migrations. Why the other fields are not initialized to 0? Even when there
> are more processors, these fields may be wrong if not zeroed when a new task
> is started. Below the fast way to fix it. This fixed the counters for me.

Ah, awesome. I hadn't had time to look in detail yet. Thanks!

> What do you think of creating a struct sched_statistics embedded into
> sched_entity so we coudl memset it to zero all at once? All fields of
> SCHED_STATS piece should be initialized, right?

Sounds like a sane plan.

One of the things I have wanted to do for a long time is to rename
sched_entity to sched_fair_entity, and do something like:

struct sched_entity {
	struct sched_common_entity       common;

	union {
		struct sched_fair_entity fair;
		struct sched_rt_entity   rt;
	};
};

I imagine we can add struct sched_statistics to the end of this as well.

The reason I haven't come around to doing this is that it takes a bit of
time to refactor the code and find all the common bits. So its been on
my todo list like forever.

Can I persuade you to look at this? :-)

I'll queue the below, can I add:

Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>

?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-09 12:24         ` Peter Zijlstra
@ 2009-07-09 12:55           ` Lucas De Marchi
  2009-07-09 13:15             ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Lucas De Marchi @ 2009-07-09 12:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On Thu, Jul 9, 2009 at 14:24, Peter Zijlstra<peterz@infradead.org> wrote:
> One of the things I have wanted to do for a long time is to rename
> sched_entity to sched_fair_entity, and do something like:
>
> struct sched_entity {
>        struct sched_common_entity       common;
>
>        union {
>                struct sched_fair_entity fair;
>                struct sched_rt_entity   rt;
>        };
> };
>
> I imagine we can add struct sched_statistics to the end of this as well.
>
> The reason I haven't come around to doing this is that it takes a bit of
> time to refactor the code and find all the common bits. So its been on
> my todo list like forever.
>
> Can I persuade you to look at this? :-)

It looks like a very good idea. As you told, it will take a time to refactor
the code. I can look at this, but only on the end of next week.

> I'll queue the below, can I add:
>
> Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
>
> ?

Sure. I'm not very habituated yet with it... do I re-send the email
or you add it by yourself?

thanks

Lucas De Marchi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: possible migration bug with hotplug cpu
  2009-07-09 12:55           ` Lucas De Marchi
@ 2009-07-09 13:15             ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2009-07-09 13:15 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Ingo Molnar, linux-kernel

On Thu, 2009-07-09 at 14:55 +0200, Lucas De Marchi wrote:
> On Thu, Jul 9, 2009 at 14:24, Peter Zijlstra<peterz@infradead.org> wrote:
> > One of the things I have wanted to do for a long time is to rename
> > sched_entity to sched_fair_entity, and do something like:
> >
> > struct sched_entity {
> >        struct sched_common_entity       common;
> >
> >        union {
> >                struct sched_fair_entity fair;
> >                struct sched_rt_entity   rt;
> >        };
> > };
> >
> > I imagine we can add struct sched_statistics to the end of this as well.
> >
> > The reason I haven't come around to doing this is that it takes a bit of
> > time to refactor the code and find all the common bits. So its been on
> > my todo list like forever.
> >
> > Can I persuade you to look at this? :-)
> 
> It looks like a very good idea. As you told, it will take a time to refactor
> the code. I can look at this, but only on the end of next week.

Sure no hurry, its been without that for like forever, so a few more
weeks won't harm anybody, thanks!

> > I'll queue the below, can I add:
> >
> > Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
> >
> > ?
> 
> Sure. I'm not very habituated yet with it... do I re-send the email
> or you add it by yourself?

I've added it, simply remember to add that for future patches :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:sched/urgent] sched: Reset sched stats on fork()
  2009-07-09 11:57       ` Lucas De Marchi
  2009-07-09 12:24         ` Peter Zijlstra
@ 2009-07-10 10:41         ` tip-bot for Lucas De Marchi
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Lucas De Marchi @ 2009-07-10 10:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, lucas.de.marchi, tglx, mingo

Commit-ID:  7793527b90d9418211f4fe8464cc1dcb1631ea1b
Gitweb:     http://git.kernel.org/tip/7793527b90d9418211f4fe8464cc1dcb1631ea1b
Author:     Lucas De Marchi <lucas.de.marchi@gmail.com>
AuthorDate: Thu, 9 Jul 2009 13:57:20 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 10 Jul 2009 10:43:29 +0200

sched: Reset sched stats on fork()

The sched_stat fields are currently not reset upon fork.
Ingo's recent commit 6c594c21fcb02c662f11c97be4d7d2b73060a205
did reset nr_migrations, but it didn't reset any of the
others.

This patch resets all sched_stat fields on fork.

Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <193b0f820907090457s7a3662f4gcdecdc22fcae857b@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/sched.c |   40 +++++++++++++++++++++++++++++++---------
 1 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index a17f3d9..c4549bd 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2572,15 +2572,37 @@ static void __sched_fork(struct task_struct *p)
 	p->se.avg_wakeup		= sysctl_sched_wakeup_granularity;
 
 #ifdef CONFIG_SCHEDSTATS
-	p->se.wait_start		= 0;
-	p->se.sum_sleep_runtime		= 0;
-	p->se.sleep_start		= 0;
-	p->se.block_start		= 0;
-	p->se.sleep_max			= 0;
-	p->se.block_max			= 0;
-	p->se.exec_max			= 0;
-	p->se.slice_max			= 0;
-	p->se.wait_max			= 0;
+	p->se.wait_start			= 0;
+	p->se.wait_max				= 0;
+	p->se.wait_count			= 0;
+	p->se.wait_sum				= 0;
+
+	p->se.sleep_start			= 0;
+	p->se.sleep_max				= 0;
+	p->se.sum_sleep_runtime			= 0;
+
+	p->se.block_start			= 0;
+	p->se.block_max				= 0;
+	p->se.exec_max				= 0;
+	p->se.slice_max				= 0;
+
+	p->se.nr_migrations_cold		= 0;
+	p->se.nr_failed_migrations_affine	= 0;
+	p->se.nr_failed_migrations_running	= 0;
+	p->se.nr_failed_migrations_hot		= 0;
+	p->se.nr_forced_migrations		= 0;
+	p->se.nr_forced2_migrations		= 0;
+
+	p->se.nr_wakeups			= 0;
+	p->se.nr_wakeups_sync			= 0;
+	p->se.nr_wakeups_migrate		= 0;
+	p->se.nr_wakeups_local			= 0;
+	p->se.nr_wakeups_remote			= 0;
+	p->se.nr_wakeups_affine			= 0;
+	p->se.nr_wakeups_affine_attempts	= 0;
+	p->se.nr_wakeups_passive		= 0;
+	p->se.nr_wakeups_idle			= 0;
+
 #endif
 
 	INIT_LIST_HEAD(&p->rt.run_list);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-07-10 10:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-08 15:48 possible migration bug with hotplug cpu Lucas De Marchi
2009-07-08 15:55 ` Peter Zijlstra
2009-07-08 16:05   ` Lucas De Marchi
2009-07-08 16:39     ` Lucas De Marchi
2009-07-09 11:57       ` Lucas De Marchi
2009-07-09 12:24         ` Peter Zijlstra
2009-07-09 12:55           ` Lucas De Marchi
2009-07-09 13:15             ` Peter Zijlstra
2009-07-10 10:41         ` [tip:sched/urgent] sched: Reset sched stats on fork() tip-bot for Lucas De Marchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.