All of lore.kernel.org
 help / color / mirror / Atom feed
* clock drift in set_task_cpu()
@ 2010-07-21 11:40 Jack Daniel
  2010-07-22  5:34 ` Jack Daniel
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jack Daniel @ 2010-07-21 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Peter Zijlstra, Ingo Molnar; +Cc: LKML

Hi Peter/Ingo,

I have a query with the kernel code that was changed not too long time
back in v2.6.33-rc1 commit id 5afcdab706d6002cb02b567ba46e650215e694e8
[tip:sched/urgent] sched: Remove rq->clock coupling from
set_task_cpu()

void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
{
int old_cpu = task_cpu(p);
struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
struct cfs_rq *old_cfsrq = task_cfs_rq(p),
     *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
u64 clock_offset;

clock_offset = old_rq->clock - new_rq->clock;
---

On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
sometimes larger than old_rq->clock and so clock_offset tends to warp
around leading to incorrect values. You have very correctly noted in
the commit header that all functions that access set_task_cpu() must
do so after a call to sched_clock_remote(), in this case the function
is sched_fork(). I validated by adding update_rq_clock(old_rq); into
set_task_cpu() and that seems to fix the issue. But I noticed that
since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
(sched_clock_stable)  in sched_clock_cpu() will yield to true and the
flow never gets to sched_clock_remote() or sched_clock_local().

What do you think is the best way to approach the problem *assuming
the older kernel*, since I believe the problem still exists? That is
to reinstate your axiom ".... which should ensure the observed time
between these two cpus is monotonic"

1) CONFIG_HAVE_UNSTABLE_SCHED_CLOCK cannot be disabled since it is set
by default for x86
2) Does one create a new function with just this line of code?
fix_clock_drift()
{
if (cpu != smp_processor_id())
		clock = sched_clock_remote(scd);
	else
		clock = sched_clock_local(scd);

	return clock;
}

Thanks and regards,
Jack

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-07-21 11:40 clock drift in set_task_cpu() Jack Daniel
@ 2010-07-22  5:34 ` Jack Daniel
  2010-07-30 11:55 ` Jack Daniel
  2010-08-05  9:58 ` Peter Zijlstra
  2 siblings, 0 replies; 9+ messages in thread
From: Jack Daniel @ 2010-07-22  5:34 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Peter Zijlstra, Ingo Molnar

Greetings,

I would be much obliged if anyone can answer my below query. Any
guidance or advice is much appreciated. I believe the problem still
exists in the new kernel versions.

Thanks and regards,
Jack

On Wed, Jul 21, 2010 at 5:10 PM, Jack Daniel <wanders.thirst@gmail.com> wrote:
> Hi Peter/Ingo,
>
> I have a query with the kernel code that was changed not too long time
> back in v2.6.33-rc1 commit id 5afcdab706d6002cb02b567ba46e650215e694e8
> [tip:sched/urgent] sched: Remove rq->clock coupling from
> set_task_cpu()
>
> void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> {
> int old_cpu = task_cpu(p);
> struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
> struct cfs_rq *old_cfsrq = task_cfs_rq(p),
>      *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
> u64 clock_offset;
>
> clock_offset = old_rq->clock - new_rq->clock;
> ---
>
> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
> sometimes larger than old_rq->clock and so clock_offset tends to warp
> around leading to incorrect values. You have very correctly noted in
> the commit header that all functions that access set_task_cpu() must
> do so after a call to sched_clock_remote(), in this case the function
> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
> set_task_cpu() and that seems to fix the issue. But I noticed that
> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
> flow never gets to sched_clock_remote() or sched_clock_local().
>
> What do you think is the best way to approach the problem *assuming
> the older kernel*, since I believe the problem still exists? That is
> to reinstate your axiom ".... which should ensure the observed time
> between these two cpus is monotonic"
>
> 1) CONFIG_HAVE_UNSTABLE_SCHED_CLOCK cannot be disabled since it is set
> by default for x86
> 2) Does one create a new function with just this line of pseudo code?
> fix_clock_drift()
> {
> if (cpu != smp_processor_id())
>                clock = sched_clock_remote(scd);
>        else
>                clock = sched_clock_local(scd);
>
>        return clock;
> }
>
> Thanks and regards,
> Jack
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-07-21 11:40 clock drift in set_task_cpu() Jack Daniel
  2010-07-22  5:34 ` Jack Daniel
@ 2010-07-30 11:55 ` Jack Daniel
  2010-08-05  9:58 ` Peter Zijlstra
  2 siblings, 0 replies; 9+ messages in thread
From: Jack Daniel @ 2010-07-30 11:55 UTC (permalink / raw)
  To: Peter Zijlstra, Peter Zijlstra, Ingo Molnar; +Cc: LKML

Hi Peter,

As a follow up on this...

On Wed, Jul 21, 2010 at 5:10 PM, Jack Daniel <wanders.thirst@gmail.com> wrote:
> Hi Peter/Ingo,
>
> I have a query with the kernel code that was changed not too long time
> back in v2.6.33-rc1 commit id 5afcdab706d6002cb02b567ba46e650215e694e8
> [tip:sched/urgent] sched: Remove rq->clock coupling from
> set_task_cpu()
>
> void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> {
> int old_cpu = task_cpu(p);
> struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
> struct cfs_rq *old_cfsrq = task_cfs_rq(p),
>      *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
> u64 clock_offset;
>
> clock_offset = old_rq->clock - new_rq->clock;
> ---
>
> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
> sometimes larger than old_rq->clock and so clock_offset tends to warp
> around leading to incorrect values. You have very correctly noted in
> the commit header that all functions that access set_task_cpu() must
> do so after a call to sched_clock_remote(), in this case the function
> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
> set_task_cpu() and that seems to fix the issue. But I noticed that
> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
> flow never gets to sched_clock_remote() or sched_clock_local().
>
> What do you think is the best way to approach the problem *assuming
> the older kernel*, since I believe the problem still exists? That is
> to reinstate your axiom ".... which should ensure the observed time
> between these two cpus is monotonic"
>
> 1) CONFIG_HAVE_UNSTABLE_SCHED_CLOCK cannot be disabled since it is set
> by default for x86
> 2) Does one create a new function with just this line of code?
> fix_clock_drift()
> {
> if (cpu != smp_processor_id())
>                clock = sched_clock_remote(scd);
>        else
>                clock = sched_clock_local(scd);
>
>        return clock;
> }
>

I bet you would have had come across this problem and hence chose to
surgically remove the impeding code with commit 5afcdab. I now think
it was a good choice but the right thing would have been to correct
the problem itself. I think this code should have solved the problem.

diff --git a/kernel/sched.c b/kernel/sched.c
index 1d39b00..5fd63f2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2068,6 +2068,13 @@ void set_task_cpu(struct task_struct *p,
unsigned int new_cpu)
        struct cfs_rq *old_cfsrq = task_cfs_rq(p),
                      *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
        u64 clock_offset;
+       unsigned long flags;
+
+       rmb();
+       local_irq_save(flags);
+       update_rq_clock(old_rq);
+       update_rq_clock(new_rq);
+       local_irq_restore(flags);


Thanks and regards,
Jack

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-07-21 11:40 clock drift in set_task_cpu() Jack Daniel
  2010-07-22  5:34 ` Jack Daniel
  2010-07-30 11:55 ` Jack Daniel
@ 2010-08-05  9:58 ` Peter Zijlstra
  2010-08-09 13:17   ` Jack Daniel
  2010-08-20 14:16   ` [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks tip-bot for Peter Zijlstra
  2 siblings, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2010-08-05  9:58 UTC (permalink / raw)
  To: Jack Daniel; +Cc: Ingo Molnar, LKML, Mike Galbraith

On Wed, 2010-07-21 at 17:10 +0530, Jack Daniel wrote:
> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
> sometimes larger than old_rq->clock and so clock_offset tends to warp
> around leading to incorrect values.

What values get incorrect, do you observe vruntime funnies or only the
schedstat values?

>  You have very correctly noted in
> the commit header that all functions that access set_task_cpu() must
> do so after a call to sched_clock_remote(), in this case the function
> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
> set_task_cpu() and that seems to fix the issue. 

Ah, so the problem is that task_fork_fair() does the task placement
without updated rq clocks? In which case I think we should at least do
an update_rq_clock(rq) in there (see the below patch).

> But I noticed that
> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
> flow never gets to sched_clock_remote() or sched_clock_local().

sched_clock_stable being true implies the clock is stable across cores
and thus it shouldn't matter. Or are you saying you're seeing it being
set and still have issues?

> I bet you would have had come across this problem and hence chose to
> surgically remove the impeding code with commit 5afcdab. I now think
> it was a good choice but the right thing would have been to correct
> the problem itself. I think this code should have solved the problem.
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 1d39b00..5fd63f2 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2068,6 +2068,13 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>         struct cfs_rq *old_cfsrq = task_cfs_rq(p),
>                       *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
>         u64 clock_offset;
> +       unsigned long flags;
> +
> +       rmb();
> +       local_irq_save(flags);
> +       update_rq_clock(old_rq);
> +       update_rq_clock(new_rq);
> +       local_irq_restore(flags);

The problem here is that your patch introduces the exact race I was
trying to close. We can only access rq->clock when also holding the
appropriate rq->lock, disabling IRQs, while also required is not
sufficient.

[ Also, that rmb() looks just plain wrong ]

Anyway, does the below cure your trouble? (Also, could you describe your
actually observed trouble in more detail?)

---

Subject: sched: ensure rq->clock get sync'ed when migrating tasks

sched_fork() -- we do task placement in ->task_fork_fair() ensure we
  update_rq_clock() so we work with current time. We leave the vruntime
  in relative state, so the time delay until wake_up_new_task() doesn't
  matter.

wake_up_new_task() -- Since task_fork_fair() left p->vruntime in
  relative state we can safely migrate, the activate_task() on the
  remote rq will call update_rq_clock() and causes the clock to be
  synced (enough).

try_to_wake_up() -- In case we'll migrate, we need to update the old
  rq->clock, the activate_task() in ttwu_activate() will already update
  the new rq->clock, and thus the clocks will get sync'ed.

load-balance -- Migrating running tasks always happens with both rq's
  locked, either through double_rq_lock() or double_lock_balance(). So
  sync the rq->clock there.

The problem seems to be that this all could result in too many rq->clock
updates, which are expensive.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c      |   20 ++++++++++++++++++--
 kernel/sched_fair.c |    2 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index d3c0262..69584b4 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1756,13 +1756,22 @@ static int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
  */
 static int double_lock_balance(struct rq *this_rq, struct rq *busiest)
 {
+	int ret;
+
+#ifdef CONFIG_SCHED_DEBUG
 	if (unlikely(!irqs_disabled())) {
 		/* printk() doesn't work good under rq->lock */
 		raw_spin_unlock(&this_rq->lock);
 		BUG_ON(1);
 	}
+#endif
+
+	ret = _double_lock_balance(this_rq, busiest);
+
+	update_rq_clock(this_rq);
+	update_rq_clock(busiest);
 
-	return _double_lock_balance(this_rq, busiest);
+	return ret;
 }
 
 static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
@@ -1782,7 +1791,9 @@ static void double_rq_lock(struct rq *rq1, struct rq *rq2)
 	__acquires(rq1->lock)
 	__acquires(rq2->lock)
 {
+#ifdef CONFIG_SCHED_DEBUG
 	BUG_ON(!irqs_disabled());
+#endif
 	if (rq1 == rq2) {
 		raw_spin_lock(&rq1->lock);
 		__acquire(rq2->lock);	/* Fake it out ;) */
@@ -1795,6 +1806,9 @@ static void double_rq_lock(struct rq *rq1, struct rq *rq2)
 			raw_spin_lock_nested(&rq1->lock, SINGLE_DEPTH_NESTING);
 		}
 	}
+
+	update_rq_clock(rq1);
+	update_rq_clock(rq2);
 }
 
 /*
@@ -2395,8 +2409,10 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 	}
 
 	cpu = select_task_rq(rq, p, SD_BALANCE_WAKE, wake_flags);
-	if (cpu != orig_cpu)
+	if (cpu != orig_cpu) {
 		set_task_cpu(p, cpu);
+		update_rq_clock(rq);
+	}
 	__task_rq_unlock(rq);
 
 	rq = cpu_rq(cpu);
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9910e1b..f816e74 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3751,6 +3751,8 @@ static void task_fork_fair(struct task_struct *p)
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
 
+	update_rq_clock(rq);
+
 	if (unlikely(task_cpu(p) != this_cpu))
 		__set_task_cpu(p, this_cpu);
 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-08-05  9:58 ` Peter Zijlstra
@ 2010-08-09 13:17   ` Jack Daniel
  2010-08-09 14:56     ` Philby John
  2010-08-20 14:16   ` [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks tip-bot for Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Jack Daniel @ 2010-08-09 13:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, Mike Galbraith

On Thu, Aug 5, 2010 at 3:28 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-07-21 at 17:10 +0530, Jack Daniel wrote:
>> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
>> sometimes larger than old_rq->clock and so clock_offset tends to warp
>> around leading to incorrect values.
>
> What values get incorrect, do you observe vruntime funnies or only the
> schedstat values?

Just the schedstat values, did not observe anything wrong with vruntime.

>
>>  You have very correctly noted in
>> the commit header that all functions that access set_task_cpu() must
>> do so after a call to sched_clock_remote(), in this case the function
>> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
>> set_task_cpu() and that seems to fix the issue.
>
> Ah, so the problem is that task_fork_fair() does the task placement
> without updated rq clocks? In which case I think we should at least do
> an update_rq_clock(rq) in there (see the below patch).

Yes, this is indeed the problem and your patch seems to fix the issue.

>
>> But I noticed that
>> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
>> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
>> flow never gets to sched_clock_remote() or sched_clock_local().
>
> sched_clock_stable being true implies the clock is stable across cores
> and thus it shouldn't matter. Or are you saying you're seeing it being
> set and still have issues?
>

Please ignore these comments, initial debugging set me on the wrong
foot, to suggest that TSC is unstable.

> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..f816e74 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3751,6 +3751,8 @@ static void task_fork_fair(struct task_struct *p)
>
>        raw_spin_lock_irqsave(&rq->lock, flags);
>
> +       update_rq_clock(rq);

As you rightly pointed out above, updating the clocks in
task_fork_fair() will rightly fix the issue. Can get rid of rest of
the update_rq_clock() functions as they (like you said), are expensive
and I tested commenting them out.

Thanks,
Jack

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-08-09 13:17   ` Jack Daniel
@ 2010-08-09 14:56     ` Philby John
  2010-09-06  6:34       ` Jack Daniel
  0 siblings, 1 reply; 9+ messages in thread
From: Philby John @ 2010-08-09 14:56 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jack Daniel, Ingo Molnar, LKML, Mike Galbraith

On Mon, 2010-08-09 at 18:47 +0530, Jack Daniel wrote:
> On Thu, Aug 5, 2010 at 3:28 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, 2010-07-21 at 17:10 +0530, Jack Daniel wrote:
> >> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
> >> sometimes larger than old_rq->clock and so clock_offset tends to warp
> >> around leading to incorrect values.
> >
> > What values get incorrect, do you observe vruntime funnies or only the
> > schedstat values?
> 
> Just the schedstat values, did not observe anything wrong with vruntime.
> 
> >
> >>  You have very correctly noted in
> >> the commit header that all functions that access set_task_cpu() must
> >> do so after a call to sched_clock_remote(), in this case the function
> >> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
> >> set_task_cpu() and that seems to fix the issue.
> >
> > Ah, so the problem is that task_fork_fair() does the task placement
> > without updated rq clocks? In which case I think we should at least do
> > an update_rq_clock(rq) in there (see the below patch).
> 
> Yes, this is indeed the problem and your patch seems to fix the issue.
> 
> >
> >> But I noticed that
> >> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
> >> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
> >> flow never gets to sched_clock_remote() or sched_clock_local().
> >
> > sched_clock_stable being true implies the clock is stable across cores
> > and thus it shouldn't matter. Or are you saying you're seeing it being
> > set and still have issues?
> >
> 
> Please ignore these comments, initial debugging set me on the wrong
> foot, to suggest that TSC is unstable.
> 
> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> > index 9910e1b..f816e74 100644
> > --- a/kernel/sched_fair.c
> > +++ b/kernel/sched_fair.c
> > @@ -3751,6 +3751,8 @@ static void task_fork_fair(struct task_struct *p)
> >
> >        raw_spin_lock_irqsave(&rq->lock, flags);
> >
> > +       update_rq_clock(rq);
> 
> As you rightly pointed out above, updating the clocks in
> task_fork_fair() will rightly fix the issue. Can get rid of rest of
> the update_rq_clock() functions as they (like you said), are expensive
> and I tested commenting them out.

>From 1bc695bc2ac6c941724953b29f6c18196a474b8f Mon Sep 17 00:00:00 2001
From: Philby John <pjohn@mvista.com>
Date: Mon, 9 Aug 2010 18:19:08 +0530
Subject: [PATCH] sched: ensure rq->clock get sync'ed when migrating tasks

In sched_fork() when we do task placement in ->task_fork_fair()
ensure we update_rq_clock() so we work with current time. This has
been noted and verified on an Intel Greencity (Xeon 55xx)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Philby John <pjohn@mvista.com>
---
 kernel/sched_fair.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 806d1b2..48bc31c 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3751,7 +3751,7 @@ static void task_fork_fair(struct task_struct *p)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
-
+	update_rq_clock(rq);
 	if (unlikely(task_cpu(p) != this_cpu))
 		__set_task_cpu(p, this_cpu);
 
-- 
1.6.3.3.333.g4d53f





^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks
  2010-08-05  9:58 ` Peter Zijlstra
  2010-08-09 13:17   ` Jack Daniel
@ 2010-08-20 14:16   ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-08-20 14:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, peterz, pjohn,
	wanders.thirst, tglx, mingo

Commit-ID:  861d034ee814917a83bd5de4b26e3b8336ddeeb8
Gitweb:     http://git.kernel.org/tip/861d034ee814917a83bd5de4b26e3b8336ddeeb8
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 19 Aug 2010 13:31:43 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 20 Aug 2010 14:59:01 +0200

sched: Fix rq->clock synchronization when migrating tasks

sched_fork() -- we do task placement in ->task_fork_fair() ensure we
  update_rq_clock() so we work with current time. We leave the vruntime
  in relative state, so the time delay until wake_up_new_task() doesn't
  matter.

wake_up_new_task() -- Since task_fork_fair() left p->vruntime in
  relative state we can safely migrate, the activate_task() on the
  remote rq will call update_rq_clock() and causes the clock to be
  synced (enough).

Tested-by: Jack Daniel <wanders.thirst@gmail.com>
Tested-by: Philby John <pjohn@mvista.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1281002322.1923.1708.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 806d1b2..ab661eb 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3752,6 +3752,8 @@ static void task_fork_fair(struct task_struct *p)
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
 
+	update_rq_clock(rq);
+
 	if (unlikely(task_cpu(p) != this_cpu))
 		__set_task_cpu(p, this_cpu);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-08-09 14:56     ` Philby John
@ 2010-09-06  6:34       ` Jack Daniel
  2010-09-06  6:52         ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Jack Daniel @ 2010-09-06  6:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, LKML, Mike Galbraith, pjohn

Hi Ingo,

On Mon, Aug 9, 2010 at 8:26 PM, Philby John <pjohn@mvista.com> wrote:
> On Mon, 2010-08-09 at 18:47 +0530, Jack Daniel wrote:
>> On Thu, Aug 5, 2010 at 3:28 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, 2010-07-21 at 17:10 +0530, Jack Daniel wrote:
>> >> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
>> >> sometimes larger than old_rq->clock and so clock_offset tends to warp
>> >> around leading to incorrect values.
>> >
>> > What values get incorrect, do you observe vruntime funnies or only the
>> > schedstat values?
>>
>> Just the schedstat values, did not observe anything wrong with vruntime.
>>
>> >
>> >>  You have very correctly noted in
>> >> the commit header that all functions that access set_task_cpu() must
>> >> do so after a call to sched_clock_remote(), in this case the function
>> >> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
>> >> set_task_cpu() and that seems to fix the issue.
>> >
>> > Ah, so the problem is that task_fork_fair() does the task placement
>> > without updated rq clocks? In which case I think we should at least do
>> > an update_rq_clock(rq) in there (see the below patch).
>>
>> Yes, this is indeed the problem and your patch seems to fix the issue.
>>
>> >
>> >> But I noticed that
>> >> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
>> >> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
>> >> flow never gets to sched_clock_remote() or sched_clock_local().
>> >
>> > sched_clock_stable being true implies the clock is stable across cores
>> > and thus it shouldn't matter. Or are you saying you're seeing it being
>> > set and still have issues?
>> >
>>
>> Please ignore these comments, initial debugging set me on the wrong
>> foot, to suggest that TSC is unstable.
>>
>> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> > index 9910e1b..f816e74 100644
>> > --- a/kernel/sched_fair.c
>> > +++ b/kernel/sched_fair.c
>> > @@ -3751,6 +3751,8 @@ static void task_fork_fair(struct task_struct *p)
>> >
>> >        raw_spin_lock_irqsave(&rq->lock, flags);
>> >
>> > +       update_rq_clock(rq);
>>
>> As you rightly pointed out above, updating the clocks in
>> task_fork_fair() will rightly fix the issue. Can get rid of rest of
>> the update_rq_clock() functions as they (like you said), are expensive
>> and I tested commenting them out.
>
> >From 1bc695bc2ac6c941724953b29f6c18196a474b8f Mon Sep 17 00:00:00 2001
> From: Philby John <pjohn@mvista.com>
> Date: Mon, 9 Aug 2010 18:19:08 +0530
> Subject: [PATCH] sched: ensure rq->clock get sync'ed when migrating tasks
>
> In sched_fork() when we do task placement in ->task_fork_fair()
> ensure we update_rq_clock() so we work with current time. This has
> been noted and verified on an Intel Greencity (Xeon 55xx)
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Philby John <pjohn@mvista.com>
> ---
>  kernel/sched_fair.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 806d1b2..48bc31c 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3751,7 +3751,7 @@ static void task_fork_fair(struct task_struct *p)
>        unsigned long flags;
>
>        raw_spin_lock_irqsave(&rq->lock, flags);
> -
> +       update_rq_clock(rq);
>        if (unlikely(task_cpu(p) != this_cpu))
>                __set_task_cpu(p, this_cpu);
>

Any chance that you will be pulling in this fix ?

Regards,
Jack

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-09-06  6:34       ` Jack Daniel
@ 2010-09-06  6:52         ` Ingo Molnar
  0 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2010-09-06  6:52 UTC (permalink / raw)
  To: Jack Daniel; +Cc: Peter Zijlstra, LKML, Mike Galbraith, pjohn


* Jack Daniel <wanders.thirst@gmail.com> wrote:

> Hi Ingo,
>
> [...]
> 
> Any chance that you will be pulling in this fix ?

It's already upstream, as of v2.6.36-rc3. See the commit notification 
email below from Aug 19.

Thanks,

	Ingo

----- Forwarded message from tip-bot for Peter Zijlstra <peterz@infradead.org> -----

Date: Fri, 20 Aug 2010 14:16:14 GMT
From: tip-bot for Peter Zijlstra <peterz@infradead.org>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@redhat.com,
	a.p.zijlstra@chello.nl, peterz@infradead.org, pjohn@mvista.com,
	wanders.thirst@gmail.com, tglx@linutronix.de, mingo@elte.hu
Subject: [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks

Commit-ID:  861d034ee814917a83bd5de4b26e3b8336ddeeb8
Gitweb:     http://git.kernel.org/tip/861d034ee814917a83bd5de4b26e3b8336ddeeb8
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 19 Aug 2010 13:31:43 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 20 Aug 2010 14:59:01 +0200

sched: Fix rq->clock synchronization when migrating tasks

sched_fork() -- we do task placement in ->task_fork_fair() ensure we
  update_rq_clock() so we work with current time. We leave the vruntime
  in relative state, so the time delay until wake_up_new_task() doesn't
  matter.

wake_up_new_task() -- Since task_fork_fair() left p->vruntime in
  relative state we can safely migrate, the activate_task() on the
  remote rq will call update_rq_clock() and causes the clock to be
  synced (enough).

Tested-by: Jack Daniel <wanders.thirst@gmail.com>
Tested-by: Philby John <pjohn@mvista.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1281002322.1923.1708.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 806d1b2..ab661eb 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3752,6 +3752,8 @@ static void task_fork_fair(struct task_struct *p)
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
 
+	update_rq_clock(rq);
+
 	if (unlikely(task_cpu(p) != this_cpu))
 		__set_task_cpu(p, this_cpu);
 

----- End forwarded message -----

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-09-06  6:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-21 11:40 clock drift in set_task_cpu() Jack Daniel
2010-07-22  5:34 ` Jack Daniel
2010-07-30 11:55 ` Jack Daniel
2010-08-05  9:58 ` Peter Zijlstra
2010-08-09 13:17   ` Jack Daniel
2010-08-09 14:56     ` Philby John
2010-09-06  6:34       ` Jack Daniel
2010-09-06  6:52         ` Ingo Molnar
2010-08-20 14:16   ` [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.