linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] sched: cfs_rq h_load might not update due to irq disable
@ 2019-11-21  8:30 YT Chang
  2019-11-21 12:38 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: YT Chang @ 2019-11-21  8:30 UTC (permalink / raw)
  To: YT Chang, Peter Zijlstra, Matthias Brugger
  Cc: linux-arm-kernel, linux-mediatek, linux-kernel, wsd_upstream

Syndrome:

Two CPUs might do idle balance in the same time.
One CPU does idle balance and pulls some tasks.
However before pick next task, ALL task are pulled back to other CPU.
That results in infinite loop in both CPUs.

=========================================
code flow:

in pick_next_task_fair()

again:

if nr_running == 0
	goto idle
pick next task
	return

idle:
	idle_balance
       /* pull some tasks from other CPU,
        * However other CPU are also do idle balance,
	* and pull back these task */

	go to again

=========================================
The result to pull ALL tasks back when the task_h_load
is incorrect and too low.

static unsigned long task_h_load(struct task_struct *p)
{
        struct cfs_rq *cfs_rq = task_cfs_rq(p);

	update_cfs_rq_h_load(cfs_rq);
	return div64_ul(p->se.avg.load_avg_contrib * cfs_rq->h_load,
			cfs_rq->runnable_load_avg + 1);
}

The cfs_rq->h_load is incorrect and might too small.
The original idea of cfs_rq::last_h_load_update will not
update cfs_rq::h_load more than once a jiffies.
When the Two CPUs pull each other in the pick_next_task_fair,
the irq disabled and result in jiffie not update.
(Other CPUs wait for runqueue lock locked by the two CPUs.
So, ALL CPUs are irq disabled.)

Solution:
cfs_rq h_load might not update due to irq disable
use sched_clock instead jiffies

Signed-off-by: YT Chang <yt.chang@mediatek.com>
---
 kernel/sched/fair.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 83ab35e..231c53f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7578,9 +7578,11 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
 {
 	struct rq *rq = rq_of(cfs_rq);
 	struct sched_entity *se = cfs_rq->tg->se[cpu_of(rq)];
-	unsigned long now = jiffies;
+	u64 now = sched_clock_cpu(cpu_of(rq));
 	unsigned long load;
 
+	now = now * HZ >> 30;
+
 	if (cfs_rq->last_h_load_update == now)
 		return;
 
-- 
1.9.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] sched: cfs_rq h_load might not update due to irq disable
  2019-11-21  8:30 [PATCH 1/1] sched: cfs_rq h_load might not update due to irq disable YT Chang
@ 2019-11-21 12:38 ` Peter Zijlstra
  2019-11-26  3:00   ` Kathleen Chang
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2019-11-21 12:38 UTC (permalink / raw)
  To: YT Chang
  Cc: wsd_upstream, linux-kernel, linux-mediatek, Matthias Brugger,
	linux-arm-kernel

On Thu, Nov 21, 2019 at 04:30:09PM +0800, YT Chang wrote:
> Syndrome:
> 
> Two CPUs might do idle balance in the same time.
> One CPU does idle balance and pulls some tasks.
> However before pick next task, ALL task are pulled back to other CPU.
> That results in infinite loop in both CPUs.

Can you easily reproduce this?

> =========================================
> code flow:
> 
> in pick_next_task_fair()
> 
> again:
> 
> if nr_running == 0
> 	goto idle
> pick next task
> 	return
> 
> idle:
> 	idle_balance
>        /* pull some tasks from other CPU,
>         * However other CPU are also do idle balance,
> 	* and pull back these task */
> 
> 	go to again
> 
> =========================================
> The result to pull ALL tasks back when the task_h_load
> is incorrect and too low.

Clearly you're not running a PREEMPT kernel, otherwise the break in
detach_tasks() would've saved you, right?

> static unsigned long task_h_load(struct task_struct *p)
> {
>         struct cfs_rq *cfs_rq = task_cfs_rq(p);
> 
> 	update_cfs_rq_h_load(cfs_rq);
> 	return div64_ul(p->se.avg.load_avg_contrib * cfs_rq->h_load,
> 			cfs_rq->runnable_load_avg + 1);
> }
> 
> The cfs_rq->h_load is incorrect and might too small.
> The original idea of cfs_rq::last_h_load_update will not
> update cfs_rq::h_load more than once a jiffies.
> When the Two CPUs pull each other in the pick_next_task_fair,
> the irq disabled and result in jiffie not update.
> (Other CPUs wait for runqueue lock locked by the two CPUs.
> So, ALL CPUs are irq disabled.)

This cannot be true; because the loop drops rq->lock, so other CPUs
should have an opportunity to acquire the lock and make progress.

> Solution:
> cfs_rq h_load might not update due to irq disable
> use sched_clock instead jiffies
> 
> Signed-off-by: YT Chang <yt.chang@mediatek.com>
> ---
>  kernel/sched/fair.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 83ab35e..231c53f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7578,9 +7578,11 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
>  {
>  	struct rq *rq = rq_of(cfs_rq);
>  	struct sched_entity *se = cfs_rq->tg->se[cpu_of(rq)];
> -	unsigned long now = jiffies;
> +	u64 now = sched_clock_cpu(cpu_of(rq));
>  	unsigned long load;
>  
> +	now = now * HZ >> 30;
> +
>  	if (cfs_rq->last_h_load_update == now)
>  		return;
>  

This is disguisting and wrong. That is not the correct relation between
sched_clock() and jiffies.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] sched: cfs_rq h_load might not update due to irq disable
  2019-11-21 12:38 ` Peter Zijlstra
@ 2019-11-26  3:00   ` Kathleen Chang
  0 siblings, 0 replies; 3+ messages in thread
From: Kathleen Chang @ 2019-11-26  3:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: wsd_upstream, linux-kernel, linux-mediatek, Matthias Brugger,
	linux-arm-kernel

On Thu, 2019-11-21 at 13:38 +0100, Peter Zijlstra wrote:
> On Thu, Nov 21, 2019 at 04:30:09PM +0800, YT Chang wrote:
> > Syndrome:
> > 
> > Two CPUs might do idle balance in the same time.
> > One CPU does idle balance and pulls some tasks.
> > However before pick next task, ALL task are pulled back to other CPU.
> > That results in infinite loop in both CPUs.
> 
> Can you easily reproduce this?

No, I can't easily reproduce this. 
> 
> > =========================================
> > code flow:
> > 
> > in pick_next_task_fair()
> > 
> > again:
> > 
> > if nr_running == 0
> > 	goto idle
> > pick next task
> > 	return
> > 
> > idle:
> > 	idle_balance
> >        /* pull some tasks from other CPU,
> >         * However other CPU are also do idle balance,
> > 	* and pull back these task */
> > 
> > 	go to again
> > 
> > =========================================
> > The result to pull ALL tasks back when the task_h_load
> > is incorrect and too low.
> 
> Clearly you're not running a PREEMPT kernel, otherwise the break in
> detach_tasks() would've saved you, right?
> 
> > static unsigned long task_h_load(struct task_struct *p)
> > {
> >         struct cfs_rq *cfs_rq = task_cfs_rq(p);
> > 
> > 	update_cfs_rq_h_load(cfs_rq);
> > 	return div64_ul(p->se.avg.load_avg_contrib * cfs_rq->h_load,
> > 			cfs_rq->runnable_load_avg + 1);
> > }
> > 
> > The cfs_rq->h_load is incorrect and might too small.
> > The original idea of cfs_rq::last_h_load_update will not
> > update cfs_rq::h_load more than once a jiffies.
> > When the Two CPUs pull each other in the pick_next_task_fair,
> > the irq disabled and result in jiffie not update.
> > (Other CPUs wait for runqueue lock locked by the two CPUs.
> > So, ALL CPUs are irq disabled.)
> 
> This cannot be true; because the loop drops rq->lock, so other CPUs
> should have an opportunity to acquire the lock and make progress.

I recheck other CPUs situation. 
Other CPUs are irq disabled due to wait for lock (Not runqueue lock).

The root cause should be why other CPUs are waiting for the lock 
rather than replacing jiffies with sched_clock().

> 
> > Solution:
> > cfs_rq h_load might not update due to irq disable
> > use sched_clock instead jiffies
> > 
> > Signed-off-by: YT Chang <yt.chang@mediatek.com>
> > ---
> >  kernel/sched/fair.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 83ab35e..231c53f 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7578,9 +7578,11 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
> >  {
> >  	struct rq *rq = rq_of(cfs_rq);
> >  	struct sched_entity *se = cfs_rq->tg->se[cpu_of(rq)];
> > -	unsigned long now = jiffies;
> > +	u64 now = sched_clock_cpu(cpu_of(rq));
> >  	unsigned long load;
> >  
> > +	now = now * HZ >> 30;
> > +
> >  	if (cfs_rq->last_h_load_update == now)
> >  		return;
> >  
> 
> This is disguisting and wrong. That is not the correct relation between
> sched_clock() and jiffies.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-11-26  3:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-21  8:30 [PATCH 1/1] sched: cfs_rq h_load might not update due to irq disable YT Chang
2019-11-21 12:38 ` Peter Zijlstra
2019-11-26  3:00   ` Kathleen Chang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).