linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
@ 2016-05-24 17:04 Paul E. McKenney
  2016-05-25 17:49 ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2016-05-24 17:04 UTC (permalink / raw)
  To: peterz
  Cc: umgwanakikbuti, mingo, linux-kernel, bsegall, matt,
	morten.rasmussen, pjt, tglx, byungchul.park, ahh

On Mon, May 23, 2016 at 2:19 AM +0200, Peter Zijlstra wrote:
> On Sun, May 22, 2016 at 09:00:01AM +0200, Mike Galbraith wrote:
> > On Sat, 2016-05-21 at 21:00 +0200, Mike Galbraith wrote:
> > > On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
> > >
> > > > Wakees that were not migrated/normalized eat an unwanted min_vruntime,
> > > > and likely take a size XXL latency hit.  Big box running master bled
> > > > profusely under heavy load until I turned TTWU_QUEUE off.
> >
> > May as well make it official and against master.today.  Fly or die
> > little patchlet.
> >
> > sched/fair: Move se->vruntime normalization state into struct sched_entity
> 
> Does this work?

This gets rid of the additional lost wakeups introduced during the
merge window, thank you!

The pre-existing low-probability lost wakeups still persist, sad to say
Can't have everything, I guess.

Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  include/linux/sched.h |  1 +
>  kernel/sched/core.c   | 18 +++++++++++-------
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1b43b45a22b9..a2001e01b3df 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1534,6 +1534,7 @@ struct task_struct {
>         unsigned sched_reset_on_fork:1;
>         unsigned sched_contributes_to_load:1;
>         unsigned sched_migrated:1;
> +       unsigned sched_remote_wakeup:1;
>         unsigned :0; /* force alignment to the next boundary */
> 
>         /* unserialized, strictly 'current' */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 404c0784b1fc..7f2cae4620c7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1768,13 +1768,15 @@ void sched_ttwu_pending(void)
>         cookie = lockdep_pin_lock(&rq->lock);
> 
>         while (llist) {
> +               int wake_flags = 0;
> +
>                 p = llist_entry(llist, struct task_struct, wake_entry);
>                 llist = llist_next(llist);
> -               /*
> -                * See ttwu_queue(); we only call ttwu_queue_remote() when
> -                * its a x-cpu wakeup.
> -                */
> -               ttwu_do_activate(rq, p, WF_MIGRATED, cookie);
> +
> +               if (p->sched_remote_wakeup)
> +                       wake_flags = WF_MIGRATED;
> +
> +               ttwu_do_activate(rq, p, wake_flags, cookie);
>         }
> 
>         lockdep_unpin_lock(&rq->lock, cookie);
> @@ -1819,10 +1821,12 @@ void scheduler_ipi(void)
>         irq_exit();
>  }
> 
> -static void ttwu_queue_remote(struct task_struct *p, int cpu)
> +static void ttwu_queue_remote(struct task_struct *p, int cpu, int
> wake_flags)
>  {
>         struct rq *rq = cpu_rq(cpu);
> 
> +       p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
> +
>         if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {
>                 if (!set_nr_if_polling(rq->idle))
>                         smp_send_reschedule(cpu);
> @@ -1869,7 +1873,7 @@ static void ttwu_queue(struct task_struct *p, int
> cpu, int wake_flags)
>  #if defined(CONFIG_SMP)
>         if (sched_feat(TTWU_QUEUE) && !cpus_share_cache(smp_processor_id(),
> cpu)) {
>                 sched_clock_cpu(cpu); /* sync clocks x-cpu */
> -               ttwu_queue_remote(p, cpu);
> +               ttwu_queue_remote(p, cpu, wake_flags);
>                 return;
>         }
>  #endif

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-24 17:04 [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity Paul E. McKenney
@ 2016-05-25 17:49 ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2016-05-25 17:49 UTC (permalink / raw)
  To: peterz
  Cc: umgwanakikbuti, mingo, linux-kernel, bsegall, matt,
	morten.rasmussen, pjt, tglx, byungchul.park, ahh

On Tue, May 24, 2016 at 10:04:17AM -0700, Paul E. McKenney wrote:
> On Mon, May 23, 2016 at 2:19 AM +0200, Peter Zijlstra wrote:
> > On Sun, May 22, 2016 at 09:00:01AM +0200, Mike Galbraith wrote:
> > > On Sat, 2016-05-21 at 21:00 +0200, Mike Galbraith wrote:
> > > > On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
> > > >
> > > > > Wakees that were not migrated/normalized eat an unwanted min_vruntime,
> > > > > and likely take a size XXL latency hit.  Big box running master bled
> > > > > profusely under heavy load until I turned TTWU_QUEUE off.
> > >
> > > May as well make it official and against master.today.  Fly or die
> > > little patchlet.
> > >
> > > sched/fair: Move se->vruntime normalization state into struct sched_entity
> > 
> > Does this work?
> 
> This gets rid of the additional lost wakeups introduced during the
> merge window, thank you!
> 
> The pre-existing low-probability lost wakeups still persist, sad to say
> Can't have everything, I guess.

However, their behavior has changed.  Previously, the TREE03 rcutorture
scenario had the most stalls.  Now TREE07 appears to be more likely.
The corresponding Kconfig fragments are shown below, in case it gives
someone ideas about where the problem might be.

My current guess, based on insufficient data, is that TREE07 has about
an 80-90% chance of seeing a lost wakeup during a two-hour run.

							Thanx, Paul

------------------------------------------------------------------------
TREE03
------------------------------------------------------------------------
CONFIG_SMP=y
CONFIG_NR_CPUS=16
CONFIG_PREEMPT_NONE=n
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=y
#CHECK#CONFIG_PREEMPT_RCU=y
CONFIG_HZ_PERIODIC=y
CONFIG_NO_HZ_IDLE=n
CONFIG_NO_HZ_FULL=n
CONFIG_RCU_TRACE=y
CONFIG_HOTPLUG_CPU=y
CONFIG_RCU_FANOUT=2
CONFIG_RCU_FANOUT_LEAF=2
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_BOOST=y
CONFIG_RCU_KTHREAD_PRIO=2
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y

------------------------------------------------------------------------
TREE07
------------------------------------------------------------------------
CONFIG_SMP=y
CONFIG_NR_CPUS=16
CONFIG_CPUMASK_OFFSTACK=y
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
#CHECK#CONFIG_TREE_RCU=y
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=n
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=n
CONFIG_NO_HZ_FULL_SYSIDLE=y
CONFIG_RCU_FAST_NO_HZ=n
CONFIG_RCU_TRACE=y
CONFIG_HOTPLUG_CPU=y
CONFIG_RCU_FANOUT=2
CONFIG_RCU_FANOUT_LEAF=2
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-23  9:40           ` Mike Galbraith
  2016-05-23 10:13             ` Wanpeng Li
@ 2016-05-23 12:28             ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-05-23 12:28 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: mingo, linux-kernel, Pavan Kondeti, Ben Segall, Matt Fleming,
	Morten Rasmussen, Paul Turner, Thomas Gleixner, byungchul.park,
	Andrew Hunter

On Mon, May 23, 2016 at 11:40:35AM +0200, Mike Galbraith wrote:

> Yup, bugs--.  Kinda funny, I considered ~this way first, but thought
> you'd not that approach.. dang, got it back-assward ;-)

Hehe, so the new flag is in a word we've already written to on this path
and it avoids growing the structure (by an undefined number of bytes,
see gripes about sizeof(bool)).

I am thinking of doing the thing you did for a debug aid -- then again;
if the Google guys would finally get the global vruntime patches sorted,
we can go and kill all this code.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-23 10:13             ` Wanpeng Li
@ 2016-05-23 10:26               ` Mike Galbraith
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2016-05-23 10:26 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Pavan Kondeti,
	Ben Segall, Matt Fleming, Morten Rasmussen, Paul Turner,
	Thomas Gleixner, Byungchul Park, Andrew Hunter

On Mon, 2016-05-23 at 18:13 +0800, Wanpeng Li wrote:
> 2016-05-23 17:40 GMT+08:00 Mike Galbraith <umgwanakikbuti@gmail.com>:
> > On Mon, 2016-05-23 at 11:19 +0200, Peter Zijlstra wrote:
> > > On Sun, May 22, 2016 at 09:00:01AM +0200, Mike Galbraith wrote:
> > > > On Sat, 2016-05-21 at 21:00 +0200, Mike Galbraith wrote:
> > > > > On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
> > > > > 
> > > > > > Wakees that were not migrated/normalized eat an unwanted
> > > > > > min_vruntime,
> > > > > > and likely take a size XXL latency hit.  Big box running
> > > > > > master bled
> > > > > > profusely under heavy load until I turned TTWU_QUEUE off.
> > > > 
> > > > May as well make it official and against master.today.  Fly or
> > > > die
> > > > little patchlet.
> > > > 
> > > > sched/fair: Move se->vruntime normalization state into struct
> > > > sched_entity
> > > 
> > > Does this work?
> > 
> > Yup, bugs--.  Kinda funny, I considered ~this way first, but
> > thought
> > you'd not that approach.. dang, got it back-assward ;-)
> > 
> 
> Nicer this one.

Well, the wakeup is a remote wakeup whether the flag is set or not, so
in that regard it's not _squeaky_ clean, but is dinky, works (yay)..
and doesn't invent helpers with funny names vs what they actually do.

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-23  9:40           ` Mike Galbraith
@ 2016-05-23 10:13             ` Wanpeng Li
  2016-05-23 10:26               ` Mike Galbraith
  2016-05-23 12:28             ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2016-05-23 10:13 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Pavan Kondeti,
	Ben Segall, Matt Fleming, Morten Rasmussen, Paul Turner,
	Thomas Gleixner, Byungchul Park, Andrew Hunter

2016-05-23 17:40 GMT+08:00 Mike Galbraith <umgwanakikbuti@gmail.com>:
> On Mon, 2016-05-23 at 11:19 +0200, Peter Zijlstra wrote:
>> On Sun, May 22, 2016 at 09:00:01AM +0200, Mike Galbraith wrote:
>> > On Sat, 2016-05-21 at 21:00 +0200, Mike Galbraith wrote:
>> > > On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
>> > >
>> > > > Wakees that were not migrated/normalized eat an unwanted min_vruntime,
>> > > > and likely take a size XXL latency hit.  Big box running master bled
>> > > > profusely under heavy load until I turned TTWU_QUEUE off.
>> >
>> > May as well make it official and against master.today.  Fly or die
>> > little patchlet.
>> >
>> > sched/fair: Move se->vruntime normalization state into struct sched_entity
>>
>> Does this work?
>
> Yup, bugs--.  Kinda funny, I considered ~this way first, but thought
> you'd not that approach.. dang, got it back-assward ;-)
>

Nicer this one.

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

>> ---
>>  include/linux/sched.h |  1 +
>>  kernel/sched/core.c   | 18 +++++++++++-------
>>  2 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 1b43b45a22b9..a2001e01b3df 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1534,6 +1534,7 @@ struct task_struct {
>>       unsigned sched_reset_on_fork:1;
>>       unsigned sched_contributes_to_load:1;
>>       unsigned sched_migrated:1;
>> +     unsigned sched_remote_wakeup:1;
>>       unsigned :0; /* force alignment to the next boundary */
>>
>>       /* unserialized, strictly 'current' */
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 404c0784b1fc..7f2cae4620c7 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1768,13 +1768,15 @@ void sched_ttwu_pending(void)
>>       cookie = lockdep_pin_lock(&rq->lock);
>>
>>       while (llist) {
>> +             int wake_flags = 0;
>> +
>>               p = llist_entry(llist, struct task_struct,
>> wake_entry);
>>               llist = llist_next(llist);
>> -             /*
>> -              * See ttwu_queue(); we only call
>> ttwu_queue_remote() when
>> -              * its a x-cpu wakeup.
>> -              */
>> -             ttwu_do_activate(rq, p, WF_MIGRATED, cookie);
>> +
>> +             if (p->sched_remote_wakeup)
>> +                     wake_flags = WF_MIGRATED;
>> +
>> +             ttwu_do_activate(rq, p, wake_flags, cookie);
>>       }
>>
>>       lockdep_unpin_lock(&rq->lock, cookie);
>> @@ -1819,10 +1821,12 @@ void scheduler_ipi(void)
>>       irq_exit();
>>  }
>>
>> -static void ttwu_queue_remote(struct task_struct *p, int cpu)
>> +static void ttwu_queue_remote(struct task_struct *p, int cpu, int
>> wake_flags)
>>  {
>>       struct rq *rq = cpu_rq(cpu);
>>
>> +     p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
>> +
>>       if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {
>>               if (!set_nr_if_polling(rq->idle))
>>                       smp_send_reschedule(cpu);
>> @@ -1869,7 +1873,7 @@ static void ttwu_queue(struct task_struct *p,
>> int cpu, int wake_flags)
>>  #if defined(CONFIG_SMP)
>>       if (sched_feat(TTWU_QUEUE) &&
>> !cpus_share_cache(smp_processor_id(), cpu)) {
>>               sched_clock_cpu(cpu); /* sync clocks x-cpu */
>> -             ttwu_queue_remote(p, cpu);
>> +             ttwu_queue_remote(p, cpu, wake_flags);
>>               return;
>>       }
>>  #endif



-- 
Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-23  9:19         ` Peter Zijlstra
@ 2016-05-23  9:40           ` Mike Galbraith
  2016-05-23 10:13             ` Wanpeng Li
  2016-05-23 12:28             ` Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Galbraith @ 2016-05-23  9:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, Pavan Kondeti, Ben Segall, Matt Fleming,
	Morten Rasmussen, Paul Turner, Thomas Gleixner, byungchul.park,
	Andrew Hunter

On Mon, 2016-05-23 at 11:19 +0200, Peter Zijlstra wrote:
> On Sun, May 22, 2016 at 09:00:01AM +0200, Mike Galbraith wrote:
> > On Sat, 2016-05-21 at 21:00 +0200, Mike Galbraith wrote:
> > > On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
> > > 
> > > > Wakees that were not migrated/normalized eat an unwanted min_vruntime,
> > > > and likely take a size XXL latency hit.  Big box running master bled
> > > > profusely under heavy load until I turned TTWU_QUEUE off.
> > 
> > May as well make it official and against master.today.  Fly or die
> > little patchlet.
> > 
> > sched/fair: Move se->vruntime normalization state into struct sched_entity
> 
> Does this work?

Yup, bugs--.  Kinda funny, I considered ~this way first, but thought
you'd not that approach.. dang, got it back-assward ;-)

> ---
>  include/linux/sched.h |  1 +
>  kernel/sched/core.c   | 18 +++++++++++-------
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1b43b45a22b9..a2001e01b3df 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1534,6 +1534,7 @@ struct task_struct {
>  	unsigned sched_reset_on_fork:1;
>  	unsigned sched_contributes_to_load:1;
>  	unsigned sched_migrated:1;
> +	unsigned sched_remote_wakeup:1;
>  	unsigned :0; /* force alignment to the next boundary */
>  
>  	/* unserialized, strictly 'current' */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 404c0784b1fc..7f2cae4620c7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1768,13 +1768,15 @@ void sched_ttwu_pending(void)
>  	cookie = lockdep_pin_lock(&rq->lock);
>  
>  	while (llist) {
> +		int wake_flags = 0;
> +
>  		p = llist_entry(llist, struct task_struct,
> wake_entry);
>  		llist = llist_next(llist);
> -		/*
> -		 * See ttwu_queue(); we only call
> ttwu_queue_remote() when
> -		 * its a x-cpu wakeup.
> -		 */
> -		ttwu_do_activate(rq, p, WF_MIGRATED, cookie);
> +
> +		if (p->sched_remote_wakeup)
> +			wake_flags = WF_MIGRATED;
> +
> +		ttwu_do_activate(rq, p, wake_flags, cookie);
>  	}
>  
>  	lockdep_unpin_lock(&rq->lock, cookie);
> @@ -1819,10 +1821,12 @@ void scheduler_ipi(void)
>  	irq_exit();
>  }
>  
> -static void ttwu_queue_remote(struct task_struct *p, int cpu)
> +static void ttwu_queue_remote(struct task_struct *p, int cpu, int
> wake_flags)
>  {
>  	struct rq *rq = cpu_rq(cpu);
>  
> +	p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
> +
>  	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {
>  		if (!set_nr_if_polling(rq->idle))
>  			smp_send_reschedule(cpu);
> @@ -1869,7 +1873,7 @@ static void ttwu_queue(struct task_struct *p,
> int cpu, int wake_flags)
>  #if defined(CONFIG_SMP)
>  	if (sched_feat(TTWU_QUEUE) &&
> !cpus_share_cache(smp_processor_id(), cpu)) {
>  		sched_clock_cpu(cpu); /* sync clocks x-cpu */
> -		ttwu_queue_remote(p, cpu);
> +		ttwu_queue_remote(p, cpu, wake_flags);
>  		return;
>  	}
>  #endif

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-22  7:00       ` [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity Mike Galbraith
  2016-05-22  9:36         ` Peter Zijlstra
@ 2016-05-23  9:19         ` Peter Zijlstra
  2016-05-23  9:40           ` Mike Galbraith
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2016-05-23  9:19 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: mingo, linux-kernel, Pavan Kondeti, Ben Segall, Matt Fleming,
	Morten Rasmussen, Paul Turner, Thomas Gleixner, byungchul.park,
	Andrew Hunter

On Sun, May 22, 2016 at 09:00:01AM +0200, Mike Galbraith wrote:
> On Sat, 2016-05-21 at 21:00 +0200, Mike Galbraith wrote:
> > On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
> > 
> > > Wakees that were not migrated/normalized eat an unwanted min_vruntime,
> > > and likely take a size XXL latency hit.  Big box running master bled
> > > profusely under heavy load until I turned TTWU_QUEUE off.
> 
> May as well make it official and against master.today.  Fly or die
> little patchlet.
> 
> sched/fair: Move se->vruntime normalization state into struct sched_entity

Does this work?

---
 include/linux/sched.h |  1 +
 kernel/sched/core.c   | 18 +++++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1b43b45a22b9..a2001e01b3df 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1534,6 +1534,7 @@ struct task_struct {
 	unsigned sched_reset_on_fork:1;
 	unsigned sched_contributes_to_load:1;
 	unsigned sched_migrated:1;
+	unsigned sched_remote_wakeup:1;
 	unsigned :0; /* force alignment to the next boundary */
 
 	/* unserialized, strictly 'current' */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 404c0784b1fc..7f2cae4620c7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1768,13 +1768,15 @@ void sched_ttwu_pending(void)
 	cookie = lockdep_pin_lock(&rq->lock);
 
 	while (llist) {
+		int wake_flags = 0;
+
 		p = llist_entry(llist, struct task_struct, wake_entry);
 		llist = llist_next(llist);
-		/*
-		 * See ttwu_queue(); we only call ttwu_queue_remote() when
-		 * its a x-cpu wakeup.
-		 */
-		ttwu_do_activate(rq, p, WF_MIGRATED, cookie);
+
+		if (p->sched_remote_wakeup)
+			wake_flags = WF_MIGRATED;
+
+		ttwu_do_activate(rq, p, wake_flags, cookie);
 	}
 
 	lockdep_unpin_lock(&rq->lock, cookie);
@@ -1819,10 +1821,12 @@ void scheduler_ipi(void)
 	irq_exit();
 }
 
-static void ttwu_queue_remote(struct task_struct *p, int cpu)
+static void ttwu_queue_remote(struct task_struct *p, int cpu, int wake_flags)
 {
 	struct rq *rq = cpu_rq(cpu);
 
+	p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);
+
 	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {
 		if (!set_nr_if_polling(rq->idle))
 			smp_send_reschedule(cpu);
@@ -1869,7 +1873,7 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
 #if defined(CONFIG_SMP)
 	if (sched_feat(TTWU_QUEUE) && !cpus_share_cache(smp_processor_id(), cpu)) {
 		sched_clock_cpu(cpu); /* sync clocks x-cpu */
-		ttwu_queue_remote(p, cpu);
+		ttwu_queue_remote(p, cpu, wake_flags);
 		return;
 	}
 #endif

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-22  9:36         ` Peter Zijlstra
  2016-05-22  9:52           ` Mike Galbraith
@ 2016-05-22 10:33           ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-05-22 10:33 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: mingo, linux-kernel, Pavan Kondeti, Ben Segall, Matt Fleming,
	Morten Rasmussen, Paul Turner, Thomas Gleixner, byungchul.park,
	Andrew Hunter

On Sun, May 22, 2016 at 11:36:38AM +0200, Peter Zijlstra wrote:
> > 
> > sched/fair: Move se->vruntime normalization state into struct sched_entity
> 
> Yeah, I used to have a patch like this; but for debugging. I don't
> particularly like carrying this information other than for verification
> because it means we either do too much or too little normalization.
> 
> I'll try and have a look on Monday, but I got some real-life things to
> sort out first..

Ah, I think I see why we might actually need a variable this time...
Brain seems to have worked overtime while pulling weeds in the garden
:-)

In any case, I'll have a look on Monday.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-22  9:36         ` Peter Zijlstra
@ 2016-05-22  9:52           ` Mike Galbraith
  2016-05-22 10:33           ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2016-05-22  9:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, Pavan Kondeti, Ben Segall, Matt Fleming,
	Morten Rasmussen, Paul Turner, Thomas Gleixner, byungchul.park,
	Andrew Hunter

On Sun, 2016-05-22 at 11:36 +0200, Peter Zijlstra wrote:
> On Sun, May 22, 2016 at 09:00:01AM +0200, Mike Galbraith wrote:
> > On Sat, 2016-05-21 at 21:00 +0200, Mike Galbraith wrote:
> > > On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
> > > 
> > > > Wakees that were not migrated/normalized eat an unwanted min_vruntime,
> > > > and likely take a size XXL latency hit.  Big box running master bled
> > > > profusely under heavy load until I turned TTWU_QUEUE off.
> > 
> > May as well make it official and against master.today.  Fly or die
> > little patchlet.
> > 
> > sched/fair: Move se->vruntime normalization state into struct sched_entity
> 
> Yeah, I used to have a patch like this; but for debugging. I don't
> particularly like carrying this information other than for verification
> because it means we either do too much or too little normalization.
> 
> I'll try and have a look on Monday, but I got some real-life things to
> sort out first..

Ok (flush, say hi to the goldfish little patchlet). I don't care how it
gets fixed, only that it does, and yours will likely be prettier :)

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-22  7:00       ` [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity Mike Galbraith
@ 2016-05-22  9:36         ` Peter Zijlstra
  2016-05-22  9:52           ` Mike Galbraith
  2016-05-22 10:33           ` Peter Zijlstra
  2016-05-23  9:19         ` Peter Zijlstra
  1 sibling, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2016-05-22  9:36 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: mingo, linux-kernel, Pavan Kondeti, Ben Segall, Matt Fleming,
	Morten Rasmussen, Paul Turner, Thomas Gleixner, byungchul.park,
	Andrew Hunter

On Sun, May 22, 2016 at 09:00:01AM +0200, Mike Galbraith wrote:
> On Sat, 2016-05-21 at 21:00 +0200, Mike Galbraith wrote:
> > On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
> > 
> > > Wakees that were not migrated/normalized eat an unwanted min_vruntime,
> > > and likely take a size XXL latency hit.  Big box running master bled
> > > profusely under heavy load until I turned TTWU_QUEUE off.
> 
> May as well make it official and against master.today.  Fly or die
> little patchlet.
> 
> sched/fair: Move se->vruntime normalization state into struct sched_entity

Yeah, I used to have a patch like this; but for debugging. I don't
particularly like carrying this information other than for verification
because it means we either do too much or too little normalization.

I'll try and have a look on Monday, but I got some real-life things to
sort out first..

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity
  2016-05-21 19:00     ` Mike Galbraith
@ 2016-05-22  7:00       ` Mike Galbraith
  2016-05-22  9:36         ` Peter Zijlstra
  2016-05-23  9:19         ` Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Galbraith @ 2016-05-22  7:00 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, linux-kernel
  Cc: Pavan Kondeti, Ben Segall, Matt Fleming, Morten Rasmussen,
	Paul Turner, Thomas Gleixner, byungchul.park, Andrew Hunter

On Sat, 2016-05-21 at 21:00 +0200, Mike Galbraith wrote:
> On Sat, 2016-05-21 at 16:04 +0200, Mike Galbraith wrote:
> 
> > Wakees that were not migrated/normalized eat an unwanted min_vruntime,
> > and likely take a size XXL latency hit.  Big box running master bled
> > profusely under heavy load until I turned TTWU_QUEUE off.

May as well make it official and against master.today.  Fly or die
little patchlet.

sched/fair: Move se->vruntime normalization state into struct sched_entity

b5179ac70de ceased globally normalizing wakee vruntime in ttwu(), leaving
sched_ttwu_pending() with the need to know whether each wakee on wake_list
was migrated or not, to pass that on to fair class functions so they can
DTRT wrt vruntime normalization.  Store vruntime normalization state in
struct sched_entity, so fair class functions that need it always have it,
and sched_ttwu_pending() again doesn't need to care whether tasks on the
wake_list have been migrated or not.

Since there are now no consumers of ENQUEUE_MIGRATED, drop it as well. 

master v4.6-8889-gf6c658df6385 virgin
 256     49096  71698.99 MB/sec  warmup   1 sec  latency 1136.488 ms
 256    155009  72862.08 MB/sec  execute   1 sec  latency 3136.900 ms
 256    207430  72628.04 MB/sec  execute   2 sec  latency 4137.001 ms
 256    259635  72442.97 MB/sec  execute   3 sec  latency 5137.105 ms
 256    311905  72371.84 MB/sec  execute   4 sec  latency 6137.214 ms
 256    364210  72564.99 MB/sec  execute   5 sec  latency 7137.323 ms
 256    416551  72598.74 MB/sec  execute   6 sec  latency 5816.895 ms
 256    468824  72601.54 MB/sec  execute   7 sec  latency 6815.386 ms
 256    520996  72621.87 MB/sec  execute   8 sec  latency 7815.499 ms
 256    573113  72608.75 MB/sec  execute   9 sec  latency 8815.609 ms
 256  cleanup  10 sec
   0  cleanup  10 sec

master v4.6-8889-gf6c658df6385 post
 256     51527  75357.55 MB/sec  warmup   1 sec  latency 21.591 ms
 256    157610  73188.06 MB/sec  execute   1 sec  latency 12.985 ms
 256    210089  72809.01 MB/sec  execute   2 sec  latency 11.543 ms
 256    262554  72681.86 MB/sec  execute   3 sec  latency 0.209 ms
 256    315432  72798.65 MB/sec  execute   4 sec  latency 0.206 ms
 256    368162  72963.33 MB/sec  execute   5 sec  latency 8.052 ms
 256    420854  72976.50 MB/sec  execute   6 sec  latency 0.221 ms
 256    473420  72953.76 MB/sec  execute   7 sec  latency 0.198 ms
 256    525859  73011.17 MB/sec  execute   8 sec  latency 2.810 ms
 256    578301  73052.84 MB/sec  execute   9 sec  latency 0.247 ms
 256  cleanup  10 sec
   0  cleanup  10 sec


Fixes: b5179ac70de sched/fair: Prepare to fix fairness problems on migration
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 include/linux/sched.h |    1 
 kernel/sched/core.c   |    6 +----
 kernel/sched/fair.c   |   60 ++++++++++++++++++++------------------------------
 3 files changed, 28 insertions(+), 39 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1319,6 +1319,7 @@ struct sched_entity {
 	struct rb_node		run_node;
 	struct list_head	group_node;
 	unsigned int		on_rq;
+	bool			normalized;
 
 	u64			exec_start;
 	u64			sum_exec_runtime;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1719,9 +1719,6 @@ ttwu_do_activate(struct rq *rq, struct t
 #ifdef CONFIG_SMP
 	if (p->sched_contributes_to_load)
 		rq->nr_uninterruptible--;
-
-	if (wake_flags & WF_MIGRATED)
-		en_flags |= ENQUEUE_MIGRATED;
 #endif
 
 	ttwu_activate(rq, p, en_flags);
@@ -1774,7 +1771,7 @@ void sched_ttwu_pending(void)
 		 * See ttwu_queue(); we only call ttwu_queue_remote() when
 		 * its a x-cpu wakeup.
 		 */
-		ttwu_do_activate(rq, p, WF_MIGRATED, cookie);
+		ttwu_do_activate(rq, p, 0, cookie);
 	}
 
 	lockdep_unpin_lock(&rq->lock, cookie);
@@ -2166,6 +2163,7 @@ static void __sched_fork(unsigned long c
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.nr_migrations		= 0;
 	p->se.vruntime			= 0;
+	p->se.normalized		= true;
 	INIT_LIST_HEAD(&p->se.group_node);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3230,6 +3230,7 @@ place_entity(struct cfs_rq *cfs_rq, stru
 
 	/* ensure we never gain time by being placed backwards. */
 	se->vruntime = max_vruntime(se->vruntime, vruntime);
+	se->normalized = false;
 }
 
 static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
@@ -3285,29 +3286,40 @@ static inline void check_schedstat_requi
  * CPU and an up-to-date min_vruntime on the destination CPU.
  */
 
+static void normalize_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	se->vruntime -= cfs_rq->min_vruntime;
+	se->normalized = true;
+}
+
+static void renormalize_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	se->vruntime += cfs_rq->min_vruntime;
+	se->normalized = false;
+}
+
 static void
 enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
-	bool renorm = !(flags & ENQUEUE_WAKEUP) || (flags & ENQUEUE_MIGRATED);
-	bool curr = cfs_rq->curr == se;
+	bool renorm = se->normalized, curr = cfs_rq->curr == se;
 
 	/*
-	 * If we're the current task, we must renormalise before calling
+	 * If we're the current task, we must renormalize before calling
 	 * update_curr().
 	 */
 	if (renorm && curr)
-		se->vruntime += cfs_rq->min_vruntime;
+		renormalize_entity(cfs_rq, se);
 
 	update_curr(cfs_rq);
 
 	/*
-	 * Otherwise, renormalise after, such that we're placed at the current
+	 * Otherwise, renormalize after, such that we're placed at the current
 	 * moment in time, instead of some random moment in the past. Being
 	 * placed in the past could significantly boost this task to the
 	 * fairness detriment of existing tasks.
 	 */
 	if (renorm && !curr)
-		se->vruntime += cfs_rq->min_vruntime;
+		renormalize_entity(cfs_rq, se);
 
 	enqueue_entity_load_avg(cfs_rq, se);
 	account_entity_enqueue(cfs_rq, se);
@@ -3406,7 +3418,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
 	 * movement in our normalized position.
 	 */
 	if (!(flags & DEQUEUE_SLEEP))
-		se->vruntime -= cfs_rq->min_vruntime;
+		normalize_entity(cfs_rq, se);
 
 	/* return excess runtime on last dequeue */
 	return_cfs_rq_runtime(cfs_rq);
@@ -5408,7 +5420,7 @@ static void migrate_task_rq_fair(struct
 		min_vruntime = cfs_rq->min_vruntime;
 #endif
 
-		se->vruntime -= min_vruntime;
+		normalize_entity(cfs_rq, se);
 	}
 
 	/*
@@ -8319,7 +8331,7 @@ static void task_fork_fair(struct task_s
 		resched_curr(rq);
 	}
 
-	se->vruntime -= cfs_rq->min_vruntime;
+	normalize_entity(cfs_rq, se);
 
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }
@@ -8348,29 +8360,7 @@ prio_changed_fair(struct rq *rq, struct
 
 static inline bool vruntime_normalized(struct task_struct *p)
 {
-	struct sched_entity *se = &p->se;
-
-	/*
-	 * In both the TASK_ON_RQ_QUEUED and TASK_ON_RQ_MIGRATING cases,
-	 * the dequeue_entity(.flags=0) will already have normalized the
-	 * vruntime.
-	 */
-	if (p->on_rq)
-		return true;
-
-	/*
-	 * When !on_rq, vruntime of the task has usually NOT been normalized.
-	 * But there are some cases where it has already been normalized:
-	 *
-	 * - A forked child which is waiting for being woken up by
-	 *   wake_up_new_task().
-	 * - A task which has been woken up by try_to_wake_up() and
-	 *   waiting for actually being woken up by sched_ttwu_pending().
-	 */
-	if (!se->sum_exec_runtime || p->state == TASK_WAKING)
-		return true;
-
-	return false;
+	return p->se.normalized;
 }
 
 static void detach_task_cfs_rq(struct task_struct *p)
@@ -8384,7 +8374,7 @@ static void detach_task_cfs_rq(struct ta
 		 * cause 'unlimited' sleep bonus.
 		 */
 		place_entity(cfs_rq, se, 0);
-		se->vruntime -= cfs_rq->min_vruntime;
+		normalize_entity(cfs_rq, se);
 	}
 
 	/* Catch up with the cfs_rq and remove our load when we leave */
@@ -8407,8 +8397,8 @@ static void attach_task_cfs_rq(struct ta
 	/* Synchronize task with its cfs_rq */
 	attach_entity_load_avg(cfs_rq, se);
 
-	if (!vruntime_normalized(p))
-		se->vruntime += cfs_rq->min_vruntime;
+	if (vruntime_normalized(p))
+		renormalize_entity(cfs_rq, se);
 }
 
 static void switched_from_fair(struct rq *rq, struct task_struct *p)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-05-25 17:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-24 17:04 [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity Paul E. McKenney
2016-05-25 17:49 ` Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2016-05-10 17:43 [PATCH 0/3] sched: Fix wakeup preemption regression Peter Zijlstra
2016-05-10 17:43 ` [PATCH 2/3] sched,fair: Fix local starvation Peter Zijlstra
2016-05-21 14:04   ` Mike Galbraith
2016-05-21 19:00     ` Mike Galbraith
2016-05-22  7:00       ` [patch] sched/fair: Move se->vruntime normalization state into struct sched_entity Mike Galbraith
2016-05-22  9:36         ` Peter Zijlstra
2016-05-22  9:52           ` Mike Galbraith
2016-05-22 10:33           ` Peter Zijlstra
2016-05-23  9:19         ` Peter Zijlstra
2016-05-23  9:40           ` Mike Galbraith
2016-05-23 10:13             ` Wanpeng Li
2016-05-23 10:26               ` Mike Galbraith
2016-05-23 12:28             ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).