linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/fair: Fix enqueue_task_fair warning some more
@ 2020-05-06 14:18 Phil Auld
  2020-05-06 16:36 ` Vincent Guittot
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Phil Auld @ 2020-05-06 14:18 UTC (permalink / raw)
  To: peterz; +Cc: linux-kernel, mingo, vincent.guittot, juri.lelli

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the (rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list.  In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop.

Address this issue by saving the se pointer when the first loop exits and
resetting it before doing the fix up, if needed.

Signed-off-by: Phil Auld <pauld@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..719c996317e3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 {
 	struct cfs_rq *cfs_rq;
 	struct sched_entity *se = &p->se;
+	struct sched_entity *saved_se = NULL;
 	int idle_h_nr_running = task_has_idle_policy(p);
 
 	/*
@@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		flags = ENQUEUE_WAKEUP;
 	}
 
+	saved_se = se;
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 
@@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		 * leaf list maintenance, resulting in triggering the assertion
 		 * below.
 		 */
+		if (saved_se)
+			se = saved_se;
 		for_each_sched_entity(se) {
 			cfs_rq = cfs_rq_of(se);
 
-- 
2.18.0


Cheers,
Phil


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-06 14:18 [PATCH] sched/fair: Fix enqueue_task_fair warning some more Phil Auld
@ 2020-05-06 16:36 ` Vincent Guittot
  2020-05-06 18:05   ` Phil Auld
  2020-05-07 20:36 ` [PATCH v2] " Phil Auld
  2020-05-12 13:52 ` [PATCH v3] " Phil Auld
  2 siblings, 1 reply; 29+ messages in thread
From: Vincent Guittot @ 2020-05-06 16:36 UTC (permalink / raw)
  To: Phil Auld; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

Hi Phil,

- reply to all this time

On Wed, 6 May 2020 at 16:18, Phil Auld <pauld@redhat.com> wrote:
>
> sched/fair: Fix enqueue_task_fair warning some more
>
> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> did not fully resolve the issues with the (rq->tmp_alone_branch !=
> &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> updated the list.  In this case the second for_each_sched_entity loop can
> further modify se. The later code to fix up the list management fails to do

But for the 2nd  for_each_sched_entity, the cfs_rq should already be
in the list, isn't it ?

The third for_each_entity loop is there for the throttled case but is
useless for other case

Could you provide us some details about the use case that creates such
a situation ?

> what is needed because se does not point to the sched_entity which broke out
> of the first loop.
>
> Address this issue by saving the se pointer when the first loop exits and
> resetting it before doing the fix up, if needed.
>
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..719c996317e3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  {
>         struct cfs_rq *cfs_rq;
>         struct sched_entity *se = &p->se;
> +       struct sched_entity *saved_se = NULL;
>         int idle_h_nr_running = task_has_idle_policy(p);
>
>         /*
> @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>                 flags = ENQUEUE_WAKEUP;
>         }
>
> +       saved_se = se;
>         for_each_sched_entity(se) {
>                 cfs_rq = cfs_rq_of(se);
>
> @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>                  * leaf list maintenance, resulting in triggering the assertion
>                  * below.
>                  */
> +               if (saved_se)
> +                       se = saved_se;
>                 for_each_sched_entity(se) {
>                         cfs_rq = cfs_rq_of(se);
>
> --
> 2.18.0
>
>
> Cheers,
> Phil
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-06 16:36 ` Vincent Guittot
@ 2020-05-06 18:05   ` Phil Auld
  2020-05-07 15:06     ` Vincent Guittot
  0 siblings, 1 reply; 29+ messages in thread
From: Phil Auld @ 2020-05-06 18:05 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

Hi Vincent,

Thanks for taking a look. More below...

On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> Hi Phil,
> 
> - reply to all this time
> 
> On Wed, 6 May 2020 at 16:18, Phil Auld <pauld@redhat.com> wrote:
> >
> > sched/fair: Fix enqueue_task_fair warning some more
> >
> > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > updated the list.  In this case the second for_each_sched_entity loop can
> > further modify se. The later code to fix up the list management fails to do
> 
> But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> in the list, isn't it ?

No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
the same. It returns false expecting the parent to be added later.

But then the parent doens't get there because it's on_rq. 

> 
> The third for_each_entity loop is there for the throttled case but is
> useless for other case
>

There actually is a throttling involved usually. The second loop breaks out
early because one of the parents is throttled. But not before it advances
se at least once. 

Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
with the right se.

> Could you provide us some details about the use case that creates such
> a situation ?
>

I admit I had to add trace_printks to get here. Here's what it showed (sorry
for the long lines...)

1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist  Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868


lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
of the 3 cases we hit.

Line 5 is right after the first loop.

Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
Line 7 is right below the enqueue_throttle label.

Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.

Based on the comment at the clean up, it looked like it expected the se to be what it was
when the first loop broke not whatever it was left at after the second loop.  Could have
been NULL there too I guess but I didn't hit that case.

This is 100% reproducible. And completely gone with the fix. I have a trace showing that.

Does that make more sense?



Cheers,
Phil


> > what is needed because se does not point to the sched_entity which broke out
> > of the first loop.
> >
> > Address this issue by saving the se pointer when the first loop exits and
> > resetting it before doing the fix up, if needed.
> >
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Juri Lelli <juri.lelli@redhat.com>
> > ---
> >  kernel/sched/fair.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 02f323b85b6d..719c996317e3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >  {
> >         struct cfs_rq *cfs_rq;
> >         struct sched_entity *se = &p->se;
> > +       struct sched_entity *saved_se = NULL;
> >         int idle_h_nr_running = task_has_idle_policy(p);
> >
> >         /*
> > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >                 flags = ENQUEUE_WAKEUP;
> >         }
> >
> > +       saved_se = se;
> >         for_each_sched_entity(se) {
> >                 cfs_rq = cfs_rq_of(se);
> >
> > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >                  * leaf list maintenance, resulting in triggering the assertion
> >                  * below.
> >                  */
> > +               if (saved_se)
> > +                       se = saved_se;
> >                 for_each_sched_entity(se) {
> >                         cfs_rq = cfs_rq_of(se);
> >
> > --
> > 2.18.0
> >
> >
> > Cheers,
> > Phil
> >
> 

-- 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-06 18:05   ` Phil Auld
@ 2020-05-07 15:06     ` Vincent Guittot
  2020-05-07 15:17       ` Phil Auld
  2020-05-07 18:04       ` Phil Auld
  0 siblings, 2 replies; 29+ messages in thread
From: Vincent Guittot @ 2020-05-07 15:06 UTC (permalink / raw)
  To: Phil Auld; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

Hi Phil,

On Wed, 6 May 2020 at 20:05, Phil Auld <pauld@redhat.com> wrote:
>
> Hi Vincent,
>
> Thanks for taking a look. More below...
>
> On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > Hi Phil,
> >
> > - reply to all this time
> >
> > On Wed, 6 May 2020 at 16:18, Phil Auld <pauld@redhat.com> wrote:
> > >
> > > sched/fair: Fix enqueue_task_fair warning some more
> > >
> > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > updated the list.  In this case the second for_each_sched_entity loop can
> > > further modify se. The later code to fix up the list management fails to do
> >
> > But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> > in the list, isn't it ?
>
> No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> the same. It returns false expecting the parent to be added later.
>
> But then the parent doens't get there because it's on_rq.
>
> >
> > The third for_each_entity loop is there for the throttled case but is
> > useless for other case
> >
>
> There actually is a throttling involved usually. The second loop breaks out
> early because one of the parents is throttled. But not before it advances
> se at least once.

Ok, that's even because of the throttling that the problem occurs

>
> Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> with the right se.
>
> > Could you provide us some details about the use case that creates such
> > a situation ?
> >
>
> I admit I had to add trace_printks to get here. Here's what it showed (sorry
> for the long lines...)
>
> 1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
> 2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
> 3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist  Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
> 4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
> 5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
> 6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
> 7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
> 8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> 9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> 10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868
>
>
> lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
> first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
> of the 3 cases we hit.
>
> Line 5 is right after the first loop.
>
> Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
> Line 7 is right below the enqueue_throttle label.
>
> Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
> do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
> from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.
>
> Based on the comment at the clean up, it looked like it expected the se to be what it was
> when the first loop broke not whatever it was left at after the second loop.  Could have
> been NULL there too I guess but I didn't hit that case.
>
> This is 100% reproducible. And completely gone with the fix. I have a trace showing that.
>
> Does that make more sense?

Yes, Good catch
And thanks for the detailed explanation.

>
>
>
> Cheers,
> Phil
>
>
> > > what is needed because se does not point to the sched_entity which broke out
> > > of the first loop.
> > >
> > > Address this issue by saving the se pointer when the first loop exits and
> > > resetting it before doing the fix up, if needed.
> > >
> > > Signed-off-by: Phil Auld <pauld@redhat.com>
> > > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Juri Lelli <juri.lelli@redhat.com>
> > > ---
> > >  kernel/sched/fair.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 02f323b85b6d..719c996317e3 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > >  {
> > >         struct cfs_rq *cfs_rq;
> > >         struct sched_entity *se = &p->se;
> > > +       struct sched_entity *saved_se = NULL;
> > >         int idle_h_nr_running = task_has_idle_policy(p);
> > >
> > >         /*
> > > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > >                 flags = ENQUEUE_WAKEUP;
> > >         }
> > >
> > > +       saved_se = se;

TBH, I don't like saving and going back to the saved se and loop one
more time on them

> > >         for_each_sched_entity(se) {
> > >                 cfs_rq = cfs_rq_of(se);

Could you add something like below in the 2nd loop instead ?

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5486,6 +5486,13 @@ enqueue_task_fair(struct rq *rq, struct
task_struct *p, int flags)
                /* end evaluation on encountering a throttled cfs_rq */
                if (cfs_rq_throttled(cfs_rq))
                        goto enqueue_throttle;
+
+               /*
+                * One parent has been throttled and cfs_rq removed from the
+                * list. Add it back to not break the leaf list.
+                */
+               if (throttled_hierarchy(cfs_rq))
+                       list_add_leaf_cfs_rq(cfs_rq);
        }

 enqueue_throttle:

> > >
> > > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > >                  * leaf list maintenance, resulting in triggering the assertion
> > >                  * below.
> > >                  */
> > > +               if (saved_se)
> > > +                       se = saved_se;
> > >                 for_each_sched_entity(se) {
> > >                         cfs_rq = cfs_rq_of(se);
> > >
> > > --
> > > 2.18.0
> > >
> > >
> > > Cheers,
> > > Phil
> > >
> >
>
> --
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-07 15:06     ` Vincent Guittot
@ 2020-05-07 15:17       ` Phil Auld
  2020-05-07 18:04       ` Phil Auld
  1 sibling, 0 replies; 29+ messages in thread
From: Phil Auld @ 2020-05-07 15:17 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

Hi Vincent,

On Thu, May 07, 2020 at 05:06:29PM +0200 Vincent Guittot wrote:
> Hi Phil,
> 
> On Wed, 6 May 2020 at 20:05, Phil Auld <pauld@redhat.com> wrote:
> >
> > Hi Vincent,
> >
> > Thanks for taking a look. More below...
> >
> > On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > > Hi Phil,
> > >
> > > - reply to all this time
> > >
> > > On Wed, 6 May 2020 at 16:18, Phil Auld <pauld@redhat.com> wrote:
> > > >
> > > > sched/fair: Fix enqueue_task_fair warning some more
> > > >
> > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > > updated the list.  In this case the second for_each_sched_entity loop can
> > > > further modify se. The later code to fix up the list management fails to do
> > >
> > > But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> > > in the list, isn't it ?
> >
> > No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> > which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> > the same. It returns false expecting the parent to be added later.
> >
> > But then the parent doens't get there because it's on_rq.
> >
> > >
> > > The third for_each_entity loop is there for the throttled case but is
> > > useless for other case
> > >
> >
> > There actually is a throttling involved usually. The second loop breaks out
> > early because one of the parents is throttled. But not before it advances
> > se at least once.
> 
> Ok, that's even because of the throttling that the problem occurs
> 
> >
> > Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> > with the right se.
> >
> > > Could you provide us some details about the use case that creates such
> > > a situation ?
> > >
> >
> > I admit I had to add trace_printks to get here. Here's what it showed (sorry
> > for the long lines...)
> >
> > 1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
> > 2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
> > 3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist  Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
> > 4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
> > 5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
> > 6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
> > 7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
> > 8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > 9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > 10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868
> >
> >
> > lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
> > first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
> > of the 3 cases we hit.
> >
> > Line 5 is right after the first loop.
> >
> > Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
> > Line 7 is right below the enqueue_throttle label.
> >
> > Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
> > do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
> > from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.
> >
> > Based on the comment at the clean up, it looked like it expected the se to be what it was
> > when the first loop broke not whatever it was left at after the second loop.  Could have
> > been NULL there too I guess but I didn't hit that case.
> >
> > This is 100% reproducible. And completely gone with the fix. I have a trace showing that.
> >
> > Does that make more sense?
> 
> Yes, Good catch
> And thanks for the detailed explanation.

No problem. I had to see it all myself anyway :)


> 
> >
> >
> >
> > Cheers,
> > Phil
> >
> >
> > > > what is needed because se does not point to the sched_entity which broke out
> > > > of the first loop.
> > > >
> > > > Address this issue by saving the se pointer when the first loop exits and
> > > > resetting it before doing the fix up, if needed.
> > > >
> > > > Signed-off-by: Phil Auld <pauld@redhat.com>
> > > > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > Cc: Juri Lelli <juri.lelli@redhat.com>
> > > > ---
> > > >  kernel/sched/fair.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 02f323b85b6d..719c996317e3 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > >  {
> > > >         struct cfs_rq *cfs_rq;
> > > >         struct sched_entity *se = &p->se;
> > > > +       struct sched_entity *saved_se = NULL;
> > > >         int idle_h_nr_running = task_has_idle_policy(p);
> > > >
> > > >         /*
> > > > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > >                 flags = ENQUEUE_WAKEUP;
> > > >         }
> > > >
> > > > +       saved_se = se;
> 
> TBH, I don't like saving and going back to the saved se and loop one
> more time on them
> 
> > > >         for_each_sched_entity(se) {
> > > >                 cfs_rq = cfs_rq_of(se);
> 
> Could you add something like below in the 2nd loop instead ?

I'll give it a try this way and let you know. I had to give that machine
away. I'll get another and make sure it hits and then we'll see.


Thanks,
Phil



> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5486,6 +5486,13 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
>                 /* end evaluation on encountering a throttled cfs_rq */
>                 if (cfs_rq_throttled(cfs_rq))
>                         goto enqueue_throttle;
> +
> +               /*
> +                * One parent has been throttled and cfs_rq removed from the
> +                * list. Add it back to not break the leaf list.
> +                */
> +               if (throttled_hierarchy(cfs_rq))
> +                       list_add_leaf_cfs_rq(cfs_rq);
>         }
> 
>  enqueue_throttle:
> 
> > > >
> > > > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > >                  * leaf list maintenance, resulting in triggering the assertion
> > > >                  * below.
> > > >                  */
> > > > +               if (saved_se)
> > > > +                       se = saved_se;
> > > >                 for_each_sched_entity(se) {
> > > >                         cfs_rq = cfs_rq_of(se);
> > > >
> > > > --
> > > > 2.18.0
> > > >
> > > >
> > > > Cheers,
> > > > Phil
> > > >
> > >
> >
> > --
> >
> 

-- 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-07 15:06     ` Vincent Guittot
  2020-05-07 15:17       ` Phil Auld
@ 2020-05-07 18:04       ` Phil Auld
  2020-05-07 18:21         ` Vincent Guittot
  1 sibling, 1 reply; 29+ messages in thread
From: Phil Auld @ 2020-05-07 18:04 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

Hi Vincent,

On Thu, May 07, 2020 at 05:06:29PM +0200 Vincent Guittot wrote:
> Hi Phil,
> 
> On Wed, 6 May 2020 at 20:05, Phil Auld <pauld@redhat.com> wrote:
> >
> > Hi Vincent,
> >
> > Thanks for taking a look. More below...
> >
> > On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > > Hi Phil,
> > >
> > > - reply to all this time
> > >
> > > On Wed, 6 May 2020 at 16:18, Phil Auld <pauld@redhat.com> wrote:
> > > >
> > > > sched/fair: Fix enqueue_task_fair warning some more
> > > >
> > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > > updated the list.  In this case the second for_each_sched_entity loop can
> > > > further modify se. The later code to fix up the list management fails to do
> > >
> > > But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> > > in the list, isn't it ?
> >
> > No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> > which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> > the same. It returns false expecting the parent to be added later.
> >
> > But then the parent doens't get there because it's on_rq.
> >
> > >
> > > The third for_each_entity loop is there for the throttled case but is
> > > useless for other case
> > >
> >
> > There actually is a throttling involved usually. The second loop breaks out
> > early because one of the parents is throttled. But not before it advances
> > se at least once.
> 
> Ok, that's even because of the throttling that the problem occurs
> 
> >
> > Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> > with the right se.
> >
> > > Could you provide us some details about the use case that creates such
> > > a situation ?
> > >
> >
> > I admit I had to add trace_printks to get here. Here's what it showed (sorry
> > for the long lines...)
> >
> > 1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
> > 2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
> > 3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist  Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
> > 4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
> > 5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
> > 6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
> > 7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
> > 8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > 9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > 10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868
> >
> >
> > lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
> > first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
> > of the 3 cases we hit.
> >
> > Line 5 is right after the first loop.
> >
> > Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
> > Line 7 is right below the enqueue_throttle label.
> >
> > Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
> > do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
> > from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.
> >
> > Based on the comment at the clean up, it looked like it expected the se to be what it was
> > when the first loop broke not whatever it was left at after the second loop.  Could have
> > been NULL there too I guess but I didn't hit that case.
> >
> > This is 100% reproducible. And completely gone with the fix. I have a trace showing that.
> >
> > Does that make more sense?
> 
> Yes, Good catch
> And thanks for the detailed explanation.
> 
> >
> >
> >
> > Cheers,
> > Phil
> >
> >
> > > > what is needed because se does not point to the sched_entity which broke out
> > > > of the first loop.
> > > >
> > > > Address this issue by saving the se pointer when the first loop exits and
> > > > resetting it before doing the fix up, if needed.
> > > >
> > > > Signed-off-by: Phil Auld <pauld@redhat.com>
> > > > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > Cc: Juri Lelli <juri.lelli@redhat.com>
> > > > ---
> > > >  kernel/sched/fair.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 02f323b85b6d..719c996317e3 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > >  {
> > > >         struct cfs_rq *cfs_rq;
> > > >         struct sched_entity *se = &p->se;
> > > > +       struct sched_entity *saved_se = NULL;
> > > >         int idle_h_nr_running = task_has_idle_policy(p);
> > > >
> > > >         /*
> > > > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > >                 flags = ENQUEUE_WAKEUP;
> > > >         }
> > > >
> > > > +       saved_se = se;
> 
> TBH, I don't like saving and going back to the saved se and loop one
> more time on them
> 
> > > >         for_each_sched_entity(se) {
> > > >                 cfs_rq = cfs_rq_of(se);
> 
> Could you add something like below in the 2nd loop instead ?

This one solves the problem as well with no other visible issues.

Do you want to spin it into a real patch?

You can have my {reported/tested}-by or whatever you need.


Cheers,
Phil

> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5486,6 +5486,13 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
>                 /* end evaluation on encountering a throttled cfs_rq */
>                 if (cfs_rq_throttled(cfs_rq))
>                         goto enqueue_throttle;
> +
> +               /*
> +                * One parent has been throttled and cfs_rq removed from the
> +                * list. Add it back to not break the leaf list.
> +                */
> +               if (throttled_hierarchy(cfs_rq))
> +                       list_add_leaf_cfs_rq(cfs_rq);
>         }
> 
>  enqueue_throttle:
> 
> > > >
> > > > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > >                  * leaf list maintenance, resulting in triggering the assertion
> > > >                  * below.
> > > >                  */
> > > > +               if (saved_se)
> > > > +                       se = saved_se;
> > > >                 for_each_sched_entity(se) {
> > > >                         cfs_rq = cfs_rq_of(se);
> > > >
> > > > --
> > > > 2.18.0
> > > >
> > > >
> > > > Cheers,
> > > > Phil
> > > >
> > >
> >
> > --
> >
> 

-- 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-07 18:04       ` Phil Auld
@ 2020-05-07 18:21         ` Vincent Guittot
  0 siblings, 0 replies; 29+ messages in thread
From: Vincent Guittot @ 2020-05-07 18:21 UTC (permalink / raw)
  To: Phil Auld; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

On Thu, 7 May 2020 at 20:04, Phil Auld <pauld@redhat.com> wrote:
>
> Hi Vincent,
>
> On Thu, May 07, 2020 at 05:06:29PM +0200 Vincent Guittot wrote:
> > Hi Phil,
> >
> > On Wed, 6 May 2020 at 20:05, Phil Auld <pauld@redhat.com> wrote:
> > >
> > > Hi Vincent,
> > >
> > > Thanks for taking a look. More below...
> > >
> > > On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > > > Hi Phil,
> > > >
> > > > - reply to all this time
> > > >
> > > > On Wed, 6 May 2020 at 16:18, Phil Auld <pauld@redhat.com> wrote:
> > > > >
> > > > > sched/fair: Fix enqueue_task_fair warning some more
> > > > >
> > > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > > > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > > > updated the list.  In this case the second for_each_sched_entity loop can
> > > > > further modify se. The later code to fix up the list management fails to do
> > > >
> > > > But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> > > > in the list, isn't it ?
> > >
> > > No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> > > which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> > > the same. It returns false expecting the parent to be added later.
> > >
> > > But then the parent doens't get there because it's on_rq.
> > >
> > > >
> > > > The third for_each_entity loop is there for the throttled case but is
> > > > useless for other case
> > > >
> > >
> > > There actually is a throttling involved usually. The second loop breaks out
> > > early because one of the parents is throttled. But not before it advances
> > > se at least once.
> >
> > Ok, that's even because of the throttling that the problem occurs
> >
> > >
> > > Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> > > with the right se.
> > >
> > > > Could you provide us some details about the use case that creates such
> > > > a situation ?
> > > >
> > >
> > > I admit I had to add trace_printks to get here. Here's what it showed (sorry
> > > for the long lines...)
> > >
> > > 1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
> > > 2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
> > > 3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist  Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
> > > 4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
> > > 5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
> > > 6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
> > > 7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
> > > 8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > > 9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > > 10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868
> > >
> > >
> > > lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
> > > first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
> > > of the 3 cases we hit.
> > >
> > > Line 5 is right after the first loop.
> > >
> > > Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
> > > Line 7 is right below the enqueue_throttle label.
> > >
> > > Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
> > > do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
> > > from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.
> > >
> > > Based on the comment at the clean up, it looked like it expected the se to be what it was
> > > when the first loop broke not whatever it was left at after the second loop.  Could have
> > > been NULL there too I guess but I didn't hit that case.
> > >
> > > This is 100% reproducible. And completely gone with the fix. I have a trace showing that.
> > >
> > > Does that make more sense?
> >
> > Yes, Good catch
> > And thanks for the detailed explanation.
> >
> > >
> > >
> > >
> > > Cheers,
> > > Phil
> > >
> > >
> > > > > what is needed because se does not point to the sched_entity which broke out
> > > > > of the first loop.
> > > > >
> > > > > Address this issue by saving the se pointer when the first loop exits and
> > > > > resetting it before doing the fix up, if needed.
> > > > >
> > > > > Signed-off-by: Phil Auld <pauld@redhat.com>
> > > > > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > > Cc: Juri Lelli <juri.lelli@redhat.com>
> > > > > ---
> > > > >  kernel/sched/fair.c | 4 ++++
> > > > >  1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > > index 02f323b85b6d..719c996317e3 100644
> > > > > --- a/kernel/sched/fair.c
> > > > > +++ b/kernel/sched/fair.c
> > > > > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > >  {
> > > > >         struct cfs_rq *cfs_rq;
> > > > >         struct sched_entity *se = &p->se;
> > > > > +       struct sched_entity *saved_se = NULL;
> > > > >         int idle_h_nr_running = task_has_idle_policy(p);
> > > > >
> > > > >         /*
> > > > > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > >                 flags = ENQUEUE_WAKEUP;
> > > > >         }
> > > > >
> > > > > +       saved_se = se;
> >
> > TBH, I don't like saving and going back to the saved se and loop one
> > more time on them
> >
> > > > >         for_each_sched_entity(se) {
> > > > >                 cfs_rq = cfs_rq_of(se);
> >
> > Could you add something like below in the 2nd loop instead ?
>
> This one solves the problem as well with no other visible issues.
>
> Do you want to spin it into a real patch?

You can respin your patch in a V2 with the proposal.
A suggested-by from me will be enough

Thanks
Vincent
>
> You can have my {reported/tested}-by or whatever you need.
>
>
> Cheers,
> Phil
>
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5486,6 +5486,13 @@ enqueue_task_fair(struct rq *rq, struct
> > task_struct *p, int flags)
> >                 /* end evaluation on encountering a throttled cfs_rq */
> >                 if (cfs_rq_throttled(cfs_rq))
> >                         goto enqueue_throttle;
> > +
> > +               /*
> > +                * One parent has been throttled and cfs_rq removed from the
> > +                * list. Add it back to not break the leaf list.
> > +                */
> > +               if (throttled_hierarchy(cfs_rq))
> > +                       list_add_leaf_cfs_rq(cfs_rq);
> >         }
> >
> >  enqueue_throttle:
> >
> > > > >
> > > > > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > >                  * leaf list maintenance, resulting in triggering the assertion
> > > > >                  * below.
> > > > >                  */
> > > > > +               if (saved_se)
> > > > > +                       se = saved_se;
> > > > >                 for_each_sched_entity(se) {
> > > > >                         cfs_rq = cfs_rq_of(se);
> > > > >
> > > > > --
> > > > > 2.18.0
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Phil
> > > > >
> > > >
> > >
> > > --
> > >
> >
>
> --
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-06 14:18 [PATCH] sched/fair: Fix enqueue_task_fair warning some more Phil Auld
  2020-05-06 16:36 ` Vincent Guittot
@ 2020-05-07 20:36 ` Phil Auld
  2020-05-08 15:15   ` Tao Zhou
  2020-05-11 19:25   ` Vincent Guittot
  2020-05-12 13:52 ` [PATCH v3] " Phil Auld
  2 siblings, 2 replies; 29+ messages in thread
From: Phil Auld @ 2020-05-07 20:36 UTC (permalink / raw)
  To: peterz; +Cc: linux-kernel, mingo, vincent.guittot, juri.lelli

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list.  In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se no longer points to the sched_entity which broke
out of the first loop.

Address this by calling leaf_add_rq_list if there are throttled parents while
doing the second for_each_sched_entity loop.

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/sched/fair.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..c6d57c334d51 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		/* end evaluation on encountering a throttled cfs_rq */
 		if (cfs_rq_throttled(cfs_rq))
 			goto enqueue_throttle;
+
+               /*
+                * One parent has been throttled and cfs_rq removed from the
+                * list. Add it back to not break the leaf list.
+                */
+               if (throttled_hierarchy(cfs_rq))
+                       list_add_leaf_cfs_rq(cfs_rq);
 	}
 
 enqueue_throttle:
-- 
2.18.0

V2 rework the fix based on Vincent's suggestion. Thanks Vincent.


Cheers,
Phil

-- 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-07 20:36 ` [PATCH v2] " Phil Auld
@ 2020-05-08 15:15   ` Tao Zhou
  2020-05-08 15:27     ` Vincent Guittot
  2020-05-11 19:25   ` Vincent Guittot
  1 sibling, 1 reply; 29+ messages in thread
From: Tao Zhou @ 2020-05-08 15:15 UTC (permalink / raw)
  To: Phil Auld
  Cc: peterz, linux-kernel, mingo, vincent.guittot, juri.lelli, ouwen210

Hi Phil,

On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
> sched/fair: Fix enqueue_task_fair warning some more
> 
> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> did not fully resolve the issues with the rq->tmp_alone_branch !=
> &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> updated the list.  In this case the second for_each_sched_entity loop can
> further modify se. The later code to fix up the list management fails to do
> what is needed because se no longer points to the sched_entity which broke
> out of the first loop.
> 

> Address this by calling leaf_add_rq_list if there are throttled parents while
> doing the second for_each_sched_entity loop.

Thanks for your trace imformation and explanation. I
truely have learned from this and that.

s/leaf_add_rq_list/list_add_leaf_cfs_rq/

> 
> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> ---
>  kernel/sched/fair.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..c6d57c334d51 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  		/* end evaluation on encountering a throttled cfs_rq */
>  		if (cfs_rq_throttled(cfs_rq))
>  			goto enqueue_throttle;
> +
> +               /*
> +                * One parent has been throttled and cfs_rq removed from the
> +                * list. Add it back to not break the leaf list.
> +                */
> +               if (throttled_hierarchy(cfs_rq))
> +                       list_add_leaf_cfs_rq(cfs_rq);
>  	}

I was confused by why the throttled cfs rq can be on list.
It is possible when enqueue a task and thanks to the 'threads'.
But I think the above comment does not truely put the right
intention, right ?
If throttled parent is onlist, the child cfs_rq is ignored
to be added to the leaf cfs_rq list me think.

unthrottle_cfs_rq() follows the same logic if i am not wrong.
Is it necessary to add the above to it ?

Thanks,
Tau

>  
>  enqueue_throttle:
> -- 
> 2.18.0
> 
> V2 rework the fix based on Vincent's suggestion. Thanks Vincent.
> 
> 
> Cheers,
> Phil
> 
> -- 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-08 15:15   ` Tao Zhou
@ 2020-05-08 15:27     ` Vincent Guittot
  2020-05-08 17:02       ` Tao Zhou
  0 siblings, 1 reply; 29+ messages in thread
From: Vincent Guittot @ 2020-05-08 15:27 UTC (permalink / raw)
  To: Tao Zhou
  Cc: Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli,
	Tao Zhou

On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
>
> Hi Phil,
>
> On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
> > sched/fair: Fix enqueue_task_fair warning some more
> >
> > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > did not fully resolve the issues with the rq->tmp_alone_branch !=
> > &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > updated the list.  In this case the second for_each_sched_entity loop can
> > further modify se. The later code to fix up the list management fails to do
> > what is needed because se no longer points to the sched_entity which broke
> > out of the first loop.
> >
>
> > Address this by calling leaf_add_rq_list if there are throttled parents while
> > doing the second for_each_sched_entity loop.
>
> Thanks for your trace imformation and explanation. I
> truely have learned from this and that.
>
> s/leaf_add_rq_list/list_add_leaf_cfs_rq/
>
> >
> > Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Juri Lelli <juri.lelli@redhat.com>
> > ---
> >  kernel/sched/fair.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 02f323b85b6d..c6d57c334d51 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >               /* end evaluation on encountering a throttled cfs_rq */
> >               if (cfs_rq_throttled(cfs_rq))
> >                       goto enqueue_throttle;
> > +
> > +               /*
> > +                * One parent has been throttled and cfs_rq removed from the
> > +                * list. Add it back to not break the leaf list.
> > +                */
> > +               if (throttled_hierarchy(cfs_rq))
> > +                       list_add_leaf_cfs_rq(cfs_rq);
> >       }
>
> I was confused by why the throttled cfs rq can be on list.
> It is possible when enqueue a task and thanks to the 'threads'.
> But I think the above comment does not truely put the right
> intention, right ?
> If throttled parent is onlist, the child cfs_rq is ignored
> to be added to the leaf cfs_rq list me think.
>
> unthrottle_cfs_rq() follows the same logic if i am not wrong.
> Is it necessary to add the above to it ?

When a cfs_rq is throttled, its sched group is dequeued and all child
cfs_rq are removed from  leaf_cfs_rq list. But the sched group of the
child cfs_rq stay enqueued in the throttled cfs_rq so child sched
group->on_rq might be still set.

>
> Thanks,
> Tau
>
> >
> >  enqueue_throttle:
> > --
> > 2.18.0
> >
> > V2 rework the fix based on Vincent's suggestion. Thanks Vincent.
> >
> >
> > Cheers,
> > Phil
> >
> > --
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-08 15:27     ` Vincent Guittot
@ 2020-05-08 17:02       ` Tao Zhou
  2020-05-11  8:36         ` Vincent Guittot
  2020-05-11  8:40         ` Dietmar Eggemann
  0 siblings, 2 replies; 29+ messages in thread
From: Tao Zhou @ 2020-05-08 17:02 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli,
	Tao Zhou

On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
> On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
> >
> > Hi Phil,
> >
> > On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
> > > sched/fair: Fix enqueue_task_fair warning some more
> > >
> > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > did not fully resolve the issues with the rq->tmp_alone_branch !=
> > > &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > updated the list.  In this case the second for_each_sched_entity loop can
> > > further modify se. The later code to fix up the list management fails to do
> > > what is needed because se no longer points to the sched_entity which broke
> > > out of the first loop.
> > >
> >
> > > Address this by calling leaf_add_rq_list if there are throttled parents while
> > > doing the second for_each_sched_entity loop.
> >
> > Thanks for your trace imformation and explanation. I
> > truely have learned from this and that.
> >
> > s/leaf_add_rq_list/list_add_leaf_cfs_rq/
> >
> > >
> > > Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > Signed-off-by: Phil Auld <pauld@redhat.com>
> > > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Juri Lelli <juri.lelli@redhat.com>
> > > ---
> > >  kernel/sched/fair.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 02f323b85b6d..c6d57c334d51 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > >               /* end evaluation on encountering a throttled cfs_rq */
> > >               if (cfs_rq_throttled(cfs_rq))
> > >                       goto enqueue_throttle;
> > > +
> > > +               /*
> > > +                * One parent has been throttled and cfs_rq removed from the
> > > +                * list. Add it back to not break the leaf list.
> > > +                */
> > > +               if (throttled_hierarchy(cfs_rq))
> > > +                       list_add_leaf_cfs_rq(cfs_rq);
> > >       }
> >
> > I was confused by why the throttled cfs rq can be on list.
> > It is possible when enqueue a task and thanks to the 'threads'.
> > But I think the above comment does not truely put the right
> > intention, right ?
> > If throttled parent is onlist, the child cfs_rq is ignored
> > to be added to the leaf cfs_rq list me think.
> >
> > unthrottle_cfs_rq() follows the same logic if i am not wrong.
> > Is it necessary to add the above to it ?
> 
> When a cfs_rq is throttled, its sched group is dequeued and all child
> cfs_rq are removed from  leaf_cfs_rq list. But the sched group of the
> child cfs_rq stay enqueued in the throttled cfs_rq so child sched
> group->on_rq might be still set.

If there is a throttle of throttle, and unthrottle the child throttled
cfs_rq(ugly):
                               ...
                                |
                      cfs_rq throttled (parent A)
                                |
                                |
                      cfs_rq in hierarchy (B)
                                |
                                |
                      cfs_rq throttled (C)
                                |
                               ...

Then unthrottle the child throttled cfs_rq C, now the A is on the 
leaf_cfs_rq list. sched_group entity of C is enqueued to B, and 
sched_group entity of B is on_rq and is ignored by enqueue but in
the throttled hierarchy and not add to leaf_cfs_rq list.
The above may be absolutely wrong that I miss something.

Another thing :
In enqueue_task_fair():

	for_each_sched_entity(se) {
		cfs_rq = cfs_rq_of(se);

		if (list_add_leaf_cfs_rq(cfs_rq))
			break;
	}

In unthrottle_cfs_rq():

	for_each_sched_entity(se) {
		cfs_rq = cfs_rq_of(se);

		list_add_leaf_cfs_rq(cfs_rq);
	}

The difference between them is that if condition, add if
condition to unthrottle_cfs_rq() may be an optimization and
keep the same.

> >
> > Thanks,
> > Tau
> >
> > >
> > >  enqueue_throttle:
> > > --
> > > 2.18.0
> > >
> > > V2 rework the fix based on Vincent's suggestion. Thanks Vincent.
> > >
> > >
> > > Cheers,
> > > Phil
> > >
> > > --
> > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-08 17:02       ` Tao Zhou
@ 2020-05-11  8:36         ` Vincent Guittot
       [not found]           ` <BL0PR14MB37792D0FD629FFF1C9FEDE369AA10@BL0PR14MB3779.namprd14.prod.outlook.com>
  2020-05-11  8:40         ` Dietmar Eggemann
  1 sibling, 1 reply; 29+ messages in thread
From: Vincent Guittot @ 2020-05-11  8:36 UTC (permalink / raw)
  To: Tao Zhou
  Cc: Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli,
	Tao Zhou

Hi Tao,

On Fri, 8 May 2020 at 18:58, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
>
> On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
> > On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
> > >
> > > Hi Phil,
> > >
> > > On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
> > > > sched/fair: Fix enqueue_task_fair warning some more
> > > >
> > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > > did not fully resolve the issues with the rq->tmp_alone_branch !=
> > > > &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> > > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > > updated the list.  In this case the second for_each_sched_entity loop can
> > > > further modify se. The later code to fix up the list management fails to do
> > > > what is needed because se no longer points to the sched_entity which broke
> > > > out of the first loop.
> > > >
> > >
> > > > Address this by calling leaf_add_rq_list if there are throttled parents while
> > > > doing the second for_each_sched_entity loop.
> > >
> > > Thanks for your trace imformation and explanation. I
> > > truely have learned from this and that.
> > >
> > > s/leaf_add_rq_list/list_add_leaf_cfs_rq/
> > >
> > > >
> > > > Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > > Signed-off-by: Phil Auld <pauld@redhat.com>
> > > > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > > Cc: Ingo Molnar <mingo@kernel.org>
> > > > Cc: Juri Lelli <juri.lelli@redhat.com>
> > > > ---
> > > >  kernel/sched/fair.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 02f323b85b6d..c6d57c334d51 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > >               /* end evaluation on encountering a throttled cfs_rq */
> > > >               if (cfs_rq_throttled(cfs_rq))
> > > >                       goto enqueue_throttle;
> > > > +
> > > > +               /*
> > > > +                * One parent has been throttled and cfs_rq removed from the
> > > > +                * list. Add it back to not break the leaf list.
> > > > +                */
> > > > +               if (throttled_hierarchy(cfs_rq))
> > > > +                       list_add_leaf_cfs_rq(cfs_rq);
> > > >       }
> > >
> > > I was confused by why the throttled cfs rq can be on list.
> > > It is possible when enqueue a task and thanks to the 'threads'.
> > > But I think the above comment does not truely put the right
> > > intention, right ?
> > > If throttled parent is onlist, the child cfs_rq is ignored
> > > to be added to the leaf cfs_rq list me think.
> > >
> > > unthrottle_cfs_rq() follows the same logic if i am not wrong.
> > > Is it necessary to add the above to it ?
> >
> > When a cfs_rq is throttled, its sched group is dequeued and all child
> > cfs_rq are removed from  leaf_cfs_rq list. But the sched group of the
> > child cfs_rq stay enqueued in the throttled cfs_rq so child sched
> > group->on_rq might be still set.
>
> If there is a throttle of throttle, and unthrottle the child throttled
> cfs_rq(ugly):
>                                ...
>                                 |
>                       cfs_rq throttled (parent A)
>                                 |
>                                 |
>                       cfs_rq in hierarchy (B)
>                                 |
>                                 |
>                       cfs_rq throttled (C)
>                                 |
>                                ...
>
> Then unthrottle the child throttled cfs_rq C, now the A is on the
> leaf_cfs_rq list. sched_group entity of C is enqueued to B, and
> sched_group entity of B is on_rq and is ignored by enqueue but in
> the throttled hierarchy and not add to leaf_cfs_rq list.
> The above may be absolutely wrong that I miss something.

several things:

your example above is safe IMO because when C is unthrottle, It's
group se will be enqueued on B which will be added to leaf_cfs_rq
list.
Then the group se of B is already on_rq but A is throttled and the 1st
loop break.  The 2nd loop will ensure that A is added to leaf_cfs_rq
list

Now, if we add one more level between C and A, we have a problem and
we should add something similar in the else

Finally, while checking the unthrottle_cfs_rq, the test if
(!cfs_rq->load.weight) return"  skips all the for_each_entity loop and
can break the leaf_cfs_rq

We need to jump to the last loop in such case

>
> Another thing :
> In enqueue_task_fair():
>
>         for_each_sched_entity(se) {
>                 cfs_rq = cfs_rq_of(se);
>
>                 if (list_add_leaf_cfs_rq(cfs_rq))
>                         break;
>         }
>
> In unthrottle_cfs_rq():
>
>         for_each_sched_entity(se) {
>                 cfs_rq = cfs_rq_of(se);
>
>                 list_add_leaf_cfs_rq(cfs_rq);
>         }
>
> The difference between them is that if condition, add if
> condition to unthrottle_cfs_rq() may be an optimization and
> keep the same.

Yes we can do the same kind of optimization

>
> > >
> > > Thanks,
> > > Tau
> > >
> > > >
> > > >  enqueue_throttle:
> > > > --
> > > > 2.18.0
> > > >
> > > > V2 rework the fix based on Vincent's suggestion. Thanks Vincent.
> > > >
> > > >
> > > > Cheers,
> > > > Phil
> > > >
> > > > --
> > > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-08 17:02       ` Tao Zhou
  2020-05-11  8:36         ` Vincent Guittot
@ 2020-05-11  8:40         ` Dietmar Eggemann
  2020-05-11  9:36           ` Vincent Guittot
  1 sibling, 1 reply; 29+ messages in thread
From: Dietmar Eggemann @ 2020-05-11  8:40 UTC (permalink / raw)
  To: Tao Zhou, Vincent Guittot
  Cc: Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli,
	Tao Zhou

On 08/05/2020 19:02, Tao Zhou wrote:
> On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
>> On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
>>>
>>> Hi Phil,
>>>
>>> On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
>>>> sched/fair: Fix enqueue_task_fair warning some more

[...]

>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 02f323b85b6d..c6d57c334d51 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>>>>               /* end evaluation on encountering a throttled cfs_rq */
>>>>               if (cfs_rq_throttled(cfs_rq))
>>>>                       goto enqueue_throttle;
>>>> +
>>>> +               /*
>>>> +                * One parent has been throttled and cfs_rq removed from the
>>>> +                * list. Add it back to not break the leaf list.
>>>> +                */
>>>> +               if (throttled_hierarchy(cfs_rq))
>>>> +                       list_add_leaf_cfs_rq(cfs_rq);
>>>>       }
>>>
>>> I was confused by why the throttled cfs rq can be on list.
>>> It is possible when enqueue a task and thanks to the 'threads'.
>>> But I think the above comment does not truely put the right
>>> intention, right ?
>>> If throttled parent is onlist, the child cfs_rq is ignored
>>> to be added to the leaf cfs_rq list me think.
>>>
>>> unthrottle_cfs_rq() follows the same logic if i am not wrong.
>>> Is it necessary to add the above to it ?
>>
>> When a cfs_rq is throttled, its sched group is dequeued and all child
>> cfs_rq are removed from  leaf_cfs_rq list. But the sched group of the
>> child cfs_rq stay enqueued in the throttled cfs_rq so child sched
>> group->on_rq might be still set.
> 
> If there is a throttle of throttle, and unthrottle the child throttled
> cfs_rq(ugly):
>                                ...
>                                 |
>                       cfs_rq throttled (parent A)
>                                 |
>                                 |
>                       cfs_rq in hierarchy (B)
>                                 |
>                                 |
>                       cfs_rq throttled (C)
>                                 |
>                                ...
> 
> Then unthrottle the child throttled cfs_rq C, now the A is on the 
> leaf_cfs_rq list. sched_group entity of C is enqueued to B, and 
> sched_group entity of B is on_rq and is ignored by enqueue but in
> the throttled hierarchy and not add to leaf_cfs_rq list.
> The above may be absolutely wrong that I miss something.
> 
> Another thing :
> In enqueue_task_fair():
> 
> 	for_each_sched_entity(se) {
> 		cfs_rq = cfs_rq_of(se);
> 
> 		if (list_add_leaf_cfs_rq(cfs_rq))
> 			break;
> 	}
> 
> In unthrottle_cfs_rq():
> 
> 	for_each_sched_entity(se) {
> 		cfs_rq = cfs_rq_of(se);
> 
> 		list_add_leaf_cfs_rq(cfs_rq);
> 	}
> 
> The difference between them is that if condition, add if
> condition to unthrottle_cfs_rq() may be an optimization and
> keep the same.
> 

I'm not 100% sure if this is exactly what Tao pointed out here but I
also had difficulties understanding understanding how this patch works:

                       p.se
                        |
      __________________|
      |
      V
     cfs_c -> tg_c ->  se_c (se->on_rq = 1)
                        |
      __________________|
      |
      v
     cfs_b -> tg_b ->  se_b
                        |
      __________________|
      |
      V
     cfs_a -> tg_a ->  se_a
                        |
      __________________|
      |
      V
     cfs_r -> tg_r
      |
      V
      rq

(1) The incomplete update happens with cfs_c at the end of
    enqueue_entity() in the first loop because of 'if ( .... ||
    cfs_bandwidth_used())' (cfs_b->on_list=0 since cfs_a is throttled)

(2) se_c breaks out of the first loop (se_c->on_rq = 1)

(3) With the patch cfs_b is added back to the list.
    But only because cfs_a->on_list=1.

But since cfs_a is throttled it should be cfs_a->on_list=0 as well.
throttle_cfs_rq()->walk_tg_tree_from(..., tg_throttle_down, ...) should
include cfs_a when calling list_del_leaf_cfs_rq().

IMHO, throttle_cfs_rq() calls tg_throttle_down() for the throttled
cfs_rq too.


Another thing: Why don't we use throttled_hierarchy(cfs_rq) instead of
cfs_bandwidth_used() in enqueue_entity() as well?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-11  8:40         ` Dietmar Eggemann
@ 2020-05-11  9:36           ` Vincent Guittot
  2020-05-11 10:39             ` Dietmar Eggemann
  0 siblings, 1 reply; 29+ messages in thread
From: Vincent Guittot @ 2020-05-11  9:36 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Tao Zhou, Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Juri Lelli, Tao Zhou

On Mon, 11 May 2020 at 10:40, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 08/05/2020 19:02, Tao Zhou wrote:
> > On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
> >> On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
> >>>
> >>> Hi Phil,
> >>>
> >>> On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
> >>>> sched/fair: Fix enqueue_task_fair warning some more
>
> [...]
>
> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>>> index 02f323b85b6d..c6d57c334d51 100644
> >>>> --- a/kernel/sched/fair.c
> >>>> +++ b/kernel/sched/fair.c
> >>>> @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >>>>               /* end evaluation on encountering a throttled cfs_rq */
> >>>>               if (cfs_rq_throttled(cfs_rq))
> >>>>                       goto enqueue_throttle;
> >>>> +
> >>>> +               /*
> >>>> +                * One parent has been throttled and cfs_rq removed from the
> >>>> +                * list. Add it back to not break the leaf list.
> >>>> +                */
> >>>> +               if (throttled_hierarchy(cfs_rq))
> >>>> +                       list_add_leaf_cfs_rq(cfs_rq);
> >>>>       }
> >>>
> >>> I was confused by why the throttled cfs rq can be on list.
> >>> It is possible when enqueue a task and thanks to the 'threads'.
> >>> But I think the above comment does not truely put the right
> >>> intention, right ?
> >>> If throttled parent is onlist, the child cfs_rq is ignored
> >>> to be added to the leaf cfs_rq list me think.
> >>>
> >>> unthrottle_cfs_rq() follows the same logic if i am not wrong.
> >>> Is it necessary to add the above to it ?
> >>
> >> When a cfs_rq is throttled, its sched group is dequeued and all child
> >> cfs_rq are removed from  leaf_cfs_rq list. But the sched group of the
> >> child cfs_rq stay enqueued in the throttled cfs_rq so child sched
> >> group->on_rq might be still set.
> >
> > If there is a throttle of throttle, and unthrottle the child throttled
> > cfs_rq(ugly):
> >                                ...
> >                                 |
> >                       cfs_rq throttled (parent A)
> >                                 |
> >                                 |
> >                       cfs_rq in hierarchy (B)
> >                                 |
> >                                 |
> >                       cfs_rq throttled (C)
> >                                 |
> >                                ...
> >
> > Then unthrottle the child throttled cfs_rq C, now the A is on the
> > leaf_cfs_rq list. sched_group entity of C is enqueued to B, and
> > sched_group entity of B is on_rq and is ignored by enqueue but in
> > the throttled hierarchy and not add to leaf_cfs_rq list.
> > The above may be absolutely wrong that I miss something.
> >
> > Another thing :
> > In enqueue_task_fair():
> >
> >       for_each_sched_entity(se) {
> >               cfs_rq = cfs_rq_of(se);
> >
> >               if (list_add_leaf_cfs_rq(cfs_rq))
> >                       break;
> >       }
> >
> > In unthrottle_cfs_rq():
> >
> >       for_each_sched_entity(se) {
> >               cfs_rq = cfs_rq_of(se);
> >
> >               list_add_leaf_cfs_rq(cfs_rq);
> >       }
> >
> > The difference between them is that if condition, add if
> > condition to unthrottle_cfs_rq() may be an optimization and
> > keep the same.
> >
>
> I'm not 100% sure if this is exactly what Tao pointed out here but I
> also had difficulties understanding understanding how this patch works:
>
>                        p.se
>                         |
>       __________________|
>       |
>       V
>      cfs_c -> tg_c ->  se_c (se->on_rq = 1)
>                         |
>       __________________|
>       |
>       v
>      cfs_b -> tg_b ->  se_b
>                         |
>       __________________|
>       |
>       V
>      cfs_a -> tg_a ->  se_a
>                         |
>       __________________|
>       |
>       V
>      cfs_r -> tg_r
>       |
>       V
>       rq
>

In your example, which cfs_ rq has been throttled ? cfs_a ?

> (1) The incomplete update happens with cfs_c at the end of
>     enqueue_entity() in the first loop because of 'if ( .... ||
>     cfs_bandwidth_used())' (cfs_b->on_list=0 since cfs_a is throttled)

so cfs_c is added with the 1st loop

>
> (2) se_c breaks out of the first loop (se_c->on_rq = 1)
>
> (3) With the patch cfs_b is added back to the list.
>     But only because cfs_a->on_list=1.

hmm I don't understand the link between cfs_b been added and cfs_a->on_list=1

cfs_b is added with 2nd loop because its throttle_count > 0 due to
cfs_a been throttled (purpose of this patch)

>
> But since cfs_a is throttled it should be cfs_a->on_list=0 as well.

So 2nd loop breaks because cfs_a is throttled

The 3rd loop will add cfs_a

> throttle_cfs_rq()->walk_tg_tree_from(..., tg_throttle_down, ...) should
> include cfs_a when calling list_del_leaf_cfs_rq().
>
> IMHO, throttle_cfs_rq() calls tg_throttle_down() for the throttled
> cfs_rq too.
>
>
> Another thing: Why don't we use throttled_hierarchy(cfs_rq) instead of
> cfs_bandwidth_used() in enqueue_entity() as well?

Mainly to be conservative because as this patch demonstrates, there
are a lot of possible use cases and combinations and I can't ensure
that it is always safe to use the throttled_hierarchy.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-11  9:36           ` Vincent Guittot
@ 2020-05-11 10:39             ` Dietmar Eggemann
  2020-05-11 12:12               ` Vincent Guittot
       [not found]               ` <BL0PR14MB3779ED5E2E5AD157B58D002C9AA10@BL0PR14MB3779.namprd14.prod.outlook.com>
  0 siblings, 2 replies; 29+ messages in thread
From: Dietmar Eggemann @ 2020-05-11 10:39 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Tao Zhou, Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Juri Lelli, Tao Zhou

On 11/05/2020 11:36, Vincent Guittot wrote:
> On Mon, 11 May 2020 at 10:40, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>
>> On 08/05/2020 19:02, Tao Zhou wrote:
>>> On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
>>>> On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
>>>>>
>>>>> Hi Phil,
>>>>>
>>>>> On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
>>>>>> sched/fair: Fix enqueue_task_fair warning some more

[...]

>> I'm not 100% sure if this is exactly what Tao pointed out here but I
>> also had difficulties understanding understanding how this patch works:
>>
>>                        p.se
>>                         |
>>       __________________|
>>       |
>>       V
>>      cfs_c -> tg_c ->  se_c (se->on_rq = 1)
>>                         |
>>       __________________|
>>       |
>>       v
>>      cfs_b -> tg_b ->  se_b
>>                         |
>>       __________________|
>>       |
>>       V
>>      cfs_a -> tg_a ->  se_a
>>                         |
>>       __________________|
>>       |
>>       V
>>      cfs_r -> tg_r
>>       |
>>       V
>>       rq
>>
> 
> In your example, which cfs_ rq has been throttled ? cfs_a ?

Yes, cfs_a. 0xffffa085e48ce000 in Phil's trace.

> 
>> (1) The incomplete update happens with cfs_c at the end of
>>     enqueue_entity() in the first loop because of 'if ( .... ||
>>     cfs_bandwidth_used())' (cfs_b->on_list=0 since cfs_a is throttled)
> 
> so cfs_c is added with the 1st loop

Yes.

>> (2) se_c breaks out of the first loop (se_c->on_rq = 1)
>>
>> (3) With the patch cfs_b is added back to the list.
>>     But only because cfs_a->on_list=1.
> 
> hmm I don't understand the link between cfs_b been added and cfs_a->on_list=1

cfs_b, 0xffffa085e48ce000 is the one which is now added in the 2. loop.

Isn't the link between cfs_b and cfs_a the first if condition in
list_add_leaf_cfs_rq():

  if (cfs_rq->tg->parent &&
      cfs_rq->tg->parent->cfs_rq[cpu]->on_list)

to 'connect the branch' or not (default, returning false case)?

> cfs_b is added with 2nd loop because its throttle_count > 0 due to
> cfs_a been throttled (purpose of this patch)
> 
>>
>> But since cfs_a is throttled it should be cfs_a->on_list=0 as well.
> 
> So 2nd loop breaks because cfs_a is throttled

Yes.

> The 3rd loop will add cfs_a

Yes, but in the example, cfs_a->on_list=1, so we bail out of
list_add_leaf_cfs_rq() early.

I don't grasp how can cfs_a->on_list=1, when cfs_a is throttled and
cfs_b, cfs_c are in a throttled hierarchy?

>> throttle_cfs_rq()->walk_tg_tree_from(..., tg_throttle_down, ...) should
>> include cfs_a when calling list_del_leaf_cfs_rq().
>>
>> IMHO, throttle_cfs_rq() calls tg_throttle_down() for the throttled
>> cfs_rq too.
>>
>>
>> Another thing: Why don't we use throttled_hierarchy(cfs_rq) instead of
>> cfs_bandwidth_used() in enqueue_entity() as well?
> 
> Mainly to be conservative because as this patch demonstrates, there
> are a lot of possible use cases and combinations and I can't ensure
> that it is always safe to use the throttled_hierarchy.

Maybe this deserves a comment then.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-11 10:39             ` Dietmar Eggemann
@ 2020-05-11 12:12               ` Vincent Guittot
  2020-05-11 17:02                 ` Dietmar Eggemann
       [not found]               ` <BL0PR14MB3779ED5E2E5AD157B58D002C9AA10@BL0PR14MB3779.namprd14.prod.outlook.com>
  1 sibling, 1 reply; 29+ messages in thread
From: Vincent Guittot @ 2020-05-11 12:12 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Tao Zhou, Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Juri Lelli, Tao Zhou

On Mon, 11 May 2020 at 12:39, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 11/05/2020 11:36, Vincent Guittot wrote:
> > On Mon, 11 May 2020 at 10:40, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> >>
> >> On 08/05/2020 19:02, Tao Zhou wrote:
> >>> On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
> >>>> On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
> >>>>>
> >>>>> Hi Phil,
> >>>>>
> >>>>> On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
> >>>>>> sched/fair: Fix enqueue_task_fair warning some more
>
> [...]
>
> >> I'm not 100% sure if this is exactly what Tao pointed out here but I
> >> also had difficulties understanding understanding how this patch works:
> >>
> >>                        p.se
> >>                         |
> >>       __________________|
> >>       |
> >>       V
> >>      cfs_c -> tg_c ->  se_c (se->on_rq = 1)
> >>                         |
> >>       __________________|
> >>       |
> >>       v
> >>      cfs_b -> tg_b ->  se_b
> >>                         |
> >>       __________________|
> >>       |
> >>       V
> >>      cfs_a -> tg_a ->  se_a
> >>                         |
> >>       __________________|
> >>       |
> >>       V
> >>      cfs_r -> tg_r
> >>       |
> >>       V
> >>       rq
> >>
> >
> > In your example, which cfs_ rq has been throttled ? cfs_a ?
>
> Yes, cfs_a. 0xffffa085e48ce000 in Phil's trace.
>
> >
> >> (1) The incomplete update happens with cfs_c at the end of
> >>     enqueue_entity() in the first loop because of 'if ( .... ||
> >>     cfs_bandwidth_used())' (cfs_b->on_list=0 since cfs_a is throttled)
> >
> > so cfs_c is added with the 1st loop
>
> Yes.
>
> >> (2) se_c breaks out of the first loop (se_c->on_rq = 1)
> >>
> >> (3) With the patch cfs_b is added back to the list.
> >>     But only because cfs_a->on_list=1.
> >
> > hmm I don't understand the link between cfs_b been added and cfs_a->on_list=1
>
> cfs_b, 0xffffa085e48ce000 is the one which is now added in the 2. loop.
>
> Isn't the link between cfs_b and cfs_a the first if condition in

on_list is only there to say if the cfs_rq is already in the list but
there is not dependency with the child

> list_add_leaf_cfs_rq():
>
>   if (cfs_rq->tg->parent &&
>       cfs_rq->tg->parent->cfs_rq[cpu]->on_list)
>
> to 'connect the branch' or not (default, returning false case)?
>

In your example above if the parent is already on the list then we
know where to insert the child.

> > cfs_b is added with 2nd loop because its throttle_count > 0 due to
> > cfs_a been throttled (purpose of this patch)
> >
> >>
> >> But since cfs_a is throttled it should be cfs_a->on_list=0 as well.
> >
> > So 2nd loop breaks because cfs_a is throttled
>
> Yes.
>
> > The 3rd loop will add cfs_a
>
> Yes, but in the example, cfs_a->on_list=1, so we bail out of
> list_add_leaf_cfs_rq() early.

Because the cfs_rq is on the list already so we don't have to add it

>
> I don't grasp how can cfs_a->on_list=1, when cfs_a is throttled and
> cfs_b, cfs_c are in a throttled hierarchy?
>
> >> throttle_cfs_rq()->walk_tg_tree_from(..., tg_throttle_down, ...) should
> >> include cfs_a when calling list_del_leaf_cfs_rq().
> >>
> >> IMHO, throttle_cfs_rq() calls tg_throttle_down() for the throttled
> >> cfs_rq too.
> >>
> >>
> >> Another thing: Why don't we use throttled_hierarchy(cfs_rq) instead of
> >> cfs_bandwidth_used() in enqueue_entity() as well?
> >
> > Mainly to be conservative because as this patch demonstrates, there
> > are a lot of possible use cases and combinations and I can't ensure
> > that it is always safe to use the throttled_hierarchy.
>
> Maybe this deserves a comment then.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-11 12:12               ` Vincent Guittot
@ 2020-05-11 17:02                 ` Dietmar Eggemann
  2020-05-11 17:14                   ` Vincent Guittot
  0 siblings, 1 reply; 29+ messages in thread
From: Dietmar Eggemann @ 2020-05-11 17:02 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Tao Zhou, Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Juri Lelli, Tao Zhou

On 11/05/2020 14:12, Vincent Guittot wrote:
> On Mon, 11 May 2020 at 12:39, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>
>> On 11/05/2020 11:36, Vincent Guittot wrote:
>>> On Mon, 11 May 2020 at 10:40, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>>>
>>>> On 08/05/2020 19:02, Tao Zhou wrote:
>>>>> On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
>>>>>> On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
>>>>>>>
>>>>>>> Hi Phil,
>>>>>>>
>>>>>>> On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
>>>>>>>> sched/fair: Fix enqueue_task_fair warning some more
>>
>> [...]
>>
>>>> I'm not 100% sure if this is exactly what Tao pointed out here but I
>>>> also had difficulties understanding understanding how this patch works:
>>>>
>>>>                        p.se
>>>>                         |
>>>>       __________________|
>>>>       |
>>>>       V
>>>>      cfs_c -> tg_c ->  se_c (se->on_rq = 1)
>>>>                         |
>>>>       __________________|
>>>>       |
>>>>       v
>>>>      cfs_b -> tg_b ->  se_b
>>>>                         |
>>>>       __________________|
>>>>       |
>>>>       V
>>>>      cfs_a -> tg_a ->  se_a
>>>>                         |
>>>>       __________________|
>>>>       |
>>>>       V
>>>>      cfs_r -> tg_r
>>>>       |
>>>>       V
>>>>       rq
>>>>
>>>
>>> In your example, which cfs_ rq has been throttled ? cfs_a ?
>>
>> Yes, cfs_a. 0xffffa085e48ce000 in Phil's trace.
>>
>>>
>>>> (1) The incomplete update happens with cfs_c at the end of
>>>>     enqueue_entity() in the first loop because of 'if ( .... ||
>>>>     cfs_bandwidth_used())' (cfs_b->on_list=0 since cfs_a is throttled)
>>>
>>> so cfs_c is added with the 1st loop
>>
>> Yes.
>>
>>>> (2) se_c breaks out of the first loop (se_c->on_rq = 1)
>>>>
>>>> (3) With the patch cfs_b is added back to the list.
>>>>     But only because cfs_a->on_list=1.
>>>
>>> hmm I don't understand the link between cfs_b been added and cfs_a->on_list=1
>>
>> cfs_b, 0xffffa085e48ce000 is the one which is now added in the 2. loop.
>>
>> Isn't the link between cfs_b and cfs_a the first if condition in
> 
> on_list is only there to say if the cfs_rq is already in the list but
> there is not dependency with the child

Yes, I agree. But coming back to what the patch does in the example:

W/ the patch, list_add_leaf_cfs_rq() is now called for cfs_b and since
cfs_b->tg->parent->cfs_a and cfs_a->on_list=1 the 'branch is now
connected' which means 'rq->tmp_alone_branch = &rq->leaf_cfs_rq_list'.

I.e. assert_list_leaf_cfs_rq() at the end of enqueue_task_fair() is not
barfing anymore.

W/o the patch, list_add_leaf_cfs_rq() called w/ cfs_c left the 'branch
open', it's not called on cfs_b and since cfs_a->on_list=1, the 3rd
for_each_sched_entity() in enqueue_task_fair() doesn't 'connect the
branch' so the assert fires.

What I don't immediately see is how can cfs_a be throttled (which causes
cfs_b -> cfs_c being a throttled hierarchy) and be on the list
(cfs_a->on_list=1) at the same time.

So the only thing how this could happen is when there was a task enqueue
in a parallel cfs_b' (another child of cfs_a) sub hierarchy just before
the example.

>> list_add_leaf_cfs_rq():
>>
>>   if (cfs_rq->tg->parent &&
>>       cfs_rq->tg->parent->cfs_rq[cpu]->on_list)
>>
>> to 'connect the branch' or not (default, returning false case)?
>>
> 
> In your example above if the parent is already on the list then we
> know where to insert the child.

True, we go the 2nd if() condition in list_add_leaf_cfs_rq().

>>> cfs_b is added with 2nd loop because its throttle_count > 0 due to
>>> cfs_a been throttled (purpose of this patch)
>>>
>>>>
>>>> But since cfs_a is throttled it should be cfs_a->on_list=0 as well.
>>>
>>> So 2nd loop breaks because cfs_a is throttled
>>
>> Yes.
>>
>>> The 3rd loop will add cfs_a
>>
>> Yes, but in the example, cfs_a->on_list=1, so we bail out of
>> list_add_leaf_cfs_rq() early.
> 
> Because the cfs_rq is on the list already so we don't have to add it

Yes.

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
       [not found]               ` <BL0PR14MB3779ED5E2E5AD157B58D002C9AA10@BL0PR14MB3779.namprd14.prod.outlook.com>
@ 2020-05-11 17:03                 ` Dietmar Eggemann
  0 siblings, 0 replies; 29+ messages in thread
From: Dietmar Eggemann @ 2020-05-11 17:03 UTC (permalink / raw)
  To: Tao Zhou
  Cc: Vincent Guittot, Phil Auld, Peter Zijlstra, linux-kernel,
	Ingo Molnar, Juri Lelli

Hi Tao,

On 11/05/2020 17:44, Tao Zhou wrote:
> Hi Dietmar,

[...]

> On Mon, May 11, 2020 at 12:39:52PM +0200, Dietmar Eggemann wrote:
>> On 11/05/2020 11:36, Vincent Guittot wrote:
>>> On Mon, 11 May 2020 at 10:40, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>>>
>>>> On 08/05/2020 19:02, Tao Zhou wrote:
>>>>> On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
>>>>>> On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
>>>>>>>
>>>>>>> Hi Phil,
>>>>>>>
>>>>>>> On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
>>>>>>>> sched/fair: Fix enqueue_task_fair warning some more

[...]

>> I don't grasp how can cfs_a->on_list=1, when cfs_a is throttled and
>> cfs_b, cfs_c are in a throttled hierarchy?
> 
> I remember that Vincent explained that in this thread:
> 
> https://lore.kernel.org/lkml/CAKfTPtDxE32RrTusYTBUcwYoJFvadLLaMUp7gOsXdj_zQcaWdA@mail.gmail.com/
> 
> This was what I confused also. When enqueue one task, the throttled
> cfs_rq may be added back to the leaf_cfs_rq list.

As long as we only consider one hierarchy than I can't see how we can
enqueue a task and hit cfs_a->on_list=1 on a throttled cfs_a.

But there might be a cfs_b' (another child of cfs_a) sub hierarchy which
had a task enqueue just before and this set cfs_a->on_list=1.

Tried to read the email you pointed at carefully but can't see it there
... pretty tired right now, maybe tomorrow?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-11 17:02                 ` Dietmar Eggemann
@ 2020-05-11 17:14                   ` Vincent Guittot
  0 siblings, 0 replies; 29+ messages in thread
From: Vincent Guittot @ 2020-05-11 17:14 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Tao Zhou, Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Juri Lelli, Tao Zhou

On Mon, 11 May 2020 at 19:02, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 11/05/2020 14:12, Vincent Guittot wrote:
> > On Mon, 11 May 2020 at 12:39, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> >>
> >> On 11/05/2020 11:36, Vincent Guittot wrote:
> >>> On Mon, 11 May 2020 at 10:40, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> >>>>
> >>>> On 08/05/2020 19:02, Tao Zhou wrote:
> >>>>> On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
> >>>>>> On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
> >>>>>>>
> >>>>>>> Hi Phil,
> >>>>>>>
> >>>>>>> On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote:
> >>>>>>>> sched/fair: Fix enqueue_task_fair warning some more
> >>
> >> [...]
> >>
> >>>> I'm not 100% sure if this is exactly what Tao pointed out here but I
> >>>> also had difficulties understanding understanding how this patch works:
> >>>>
> >>>>                        p.se
> >>>>                         |
> >>>>       __________________|
> >>>>       |
> >>>>       V
> >>>>      cfs_c -> tg_c ->  se_c (se->on_rq = 1)
> >>>>                         |
> >>>>       __________________|
> >>>>       |
> >>>>       v
> >>>>      cfs_b -> tg_b ->  se_b
> >>>>                         |
> >>>>       __________________|
> >>>>       |
> >>>>       V
> >>>>      cfs_a -> tg_a ->  se_a
> >>>>                         |
> >>>>       __________________|
> >>>>       |
> >>>>       V
> >>>>      cfs_r -> tg_r
> >>>>       |
> >>>>       V
> >>>>       rq
> >>>>
> >>>
> >>> In your example, which cfs_ rq has been throttled ? cfs_a ?
> >>
> >> Yes, cfs_a. 0xffffa085e48ce000 in Phil's trace.
> >>
> >>>
> >>>> (1) The incomplete update happens with cfs_c at the end of
> >>>>     enqueue_entity() in the first loop because of 'if ( .... ||
> >>>>     cfs_bandwidth_used())' (cfs_b->on_list=0 since cfs_a is throttled)
> >>>
> >>> so cfs_c is added with the 1st loop
> >>
> >> Yes.
> >>
> >>>> (2) se_c breaks out of the first loop (se_c->on_rq = 1)
> >>>>
> >>>> (3) With the patch cfs_b is added back to the list.
> >>>>     But only because cfs_a->on_list=1.
> >>>
> >>> hmm I don't understand the link between cfs_b been added and cfs_a->on_list=1
> >>
> >> cfs_b, 0xffffa085e48ce000 is the one which is now added in the 2. loop.
> >>
> >> Isn't the link between cfs_b and cfs_a the first if condition in
> >
> > on_list is only there to say if the cfs_rq is already in the list but
> > there is not dependency with the child
>
> Yes, I agree. But coming back to what the patch does in the example:
>
> W/ the patch, list_add_leaf_cfs_rq() is now called for cfs_b and since
> cfs_b->tg->parent->cfs_a and cfs_a->on_list=1 the 'branch is now
> connected' which means 'rq->tmp_alone_branch = &rq->leaf_cfs_rq_list'.
>
> I.e. assert_list_leaf_cfs_rq() at the end of enqueue_task_fair() is not
> barfing anymore.
>
> W/o the patch, list_add_leaf_cfs_rq() called w/ cfs_c left the 'branch
> open', it's not called on cfs_b and since cfs_a->on_list=1, the 3rd
> for_each_sched_entity() in enqueue_task_fair() doesn't 'connect the
> branch' so the assert fires.
>
> What I don't immediately see is how can cfs_a be throttled (which causes
> cfs_b -> cfs_c being a throttled hierarchy) and be on the list
> (cfs_a->on_list=1) at the same time.
>
> So the only thing how this could happen is when there was a task enqueue
> in a parallel cfs_b' (another child of cfs_a) sub hierarchy just before
> the example.

Yes. A task has been enqueued on another child (cfs_b') and cfs_a has
been be added back to ensure that cfs are correctly ordered

>
> >> list_add_leaf_cfs_rq():
> >>
> >>   if (cfs_rq->tg->parent &&
> >>       cfs_rq->tg->parent->cfs_rq[cpu]->on_list)
> >>
> >> to 'connect the branch' or not (default, returning false case)?
> >>
> >
> > In your example above if the parent is already on the list then we
> > know where to insert the child.
>
> True, we go the 2nd if() condition in list_add_leaf_cfs_rq().
>
> >>> cfs_b is added with 2nd loop because its throttle_count > 0 due to
> >>> cfs_a been throttled (purpose of this patch)
> >>>
> >>>>
> >>>> But since cfs_a is throttled it should be cfs_a->on_list=0 as well.
> >>>
> >>> So 2nd loop breaks because cfs_a is throttled
> >>
> >> Yes.
> >>
> >>> The 3rd loop will add cfs_a
> >>
> >> Yes, but in the example, cfs_a->on_list=1, so we bail out of
> >> list_add_leaf_cfs_rq() early.
> >
> > Because the cfs_rq is on the list already so we don't have to add it
>
> Yes.
>
> [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
       [not found]           ` <BL0PR14MB37792D0FD629FFF1C9FEDE369AA10@BL0PR14MB3779.namprd14.prod.outlook.com>
@ 2020-05-11 19:22             ` Vincent Guittot
  0 siblings, 0 replies; 29+ messages in thread
From: Vincent Guittot @ 2020-05-11 19:22 UTC (permalink / raw)
  To: Tao Zhou; +Cc: Phil Auld, Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

On Mon, 11 May 2020 at 17:13, Tao Zhou <ouwen210@hotmail.com> wrote:
>
> Hi Vincent,
>
> On Mon, May 11, 2020 at 10:36:43AM +0200, Vincent Guittot wrote:
> > Hi Tao,
> >
> > On Fri, 8 May 2020 at 18:58, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
> > >
> > > On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote:
> > > > On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooouoto@zoho.com.cn> wrote:
> > > > >
> > > > > Hi Phil,
> > > > >

[...]

> > several things:
> >
> > your example above is safe IMO because when C is unthrottle, It's
> > group se will be enqueued on B which will be added to leaf_cfs_rq
> > list.
>
> Sorry for a little late reply..
> I lossed here for B can derectly be added to leaf_cfs_rq and no
> intermediate cfs_rq will have the parent not on the leaf_cfs_rq.
>
> > Then the group se of B is already on_rq but A is throttled and the 1st
> > loop break.  The 2nd loop will ensure that A is added to leaf_cfs_rq
> > list
> >
> > Now, if we add one more level between C and A, we have a problem and
> > we should add something similar in the else
>
> Yes, you are right. If one more level is added, the intermediate cfs_rq
> which is in the throttled hierarchy has a chance that the parent does't
> on the leaf_cfs_rq list. And continue changing tmp_alone_branch leading
> to rq->tmp_alone_branch != rq->leaf_cfs_rq_list. Then hit that assert.
> The tricky here is that the throttled cfs_rq can be added back to the list.
>
> >
> > Finally, while checking the unthrottle_cfs_rq, the test if
> > (!cfs_rq->load.weight) return"  skips all the for_each_entity loop and
> > can break the leaf_cfs_rq
>
> Nice catch.

After more thinking, It's not needed because if load.weight == 0,
nr_running is also 0 because no entity was enqueued or the child
cfs_rq that is associated to the group entity, has been also throttled
and its throttle_count will still be > 0

>
> >
> > We need to jump to the last loop in such case
> >
> > >
> > > Another thing :
> > > In enqueue_task_fair():
> > >
> > >         for_each_sched_entity(se) {
> > >                 cfs_rq = cfs_rq_of(se);
> > >
> > >                 if (list_add_leaf_cfs_rq(cfs_rq))
> > >                         break;
> > >         }
> > >
> > > In unthrottle_cfs_rq():
> > >
> > >         for_each_sched_entity(se) {
> > >                 cfs_rq = cfs_rq_of(se);
> > >
> > >                 list_add_leaf_cfs_rq(cfs_rq);
> > >         }
> > >
> > > The difference between them is that if condition, add if
> > > condition to unthrottle_cfs_rq() may be an optimization and
> > > keep the same.
> >
> > Yes we can do the same kind of optimization
>
> Yes.
>
> Regard,
> Tao
>
> >
> > >
> > > > >
> > > > > Thanks,
> > > > > Tau
> > > > >
> > > > > >
> > > > > >  enqueue_throttle:
> > > > > > --
> > > > > > 2.18.0
> > > > > >
> > > > > > V2 rework the fix based on Vincent's suggestion. Thanks Vincent.
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Phil
> > > > > >
> > > > > > --
> > > > > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-07 20:36 ` [PATCH v2] " Phil Auld
  2020-05-08 15:15   ` Tao Zhou
@ 2020-05-11 19:25   ` Vincent Guittot
  2020-05-11 20:44     ` Phil Auld
  1 sibling, 1 reply; 29+ messages in thread
From: Vincent Guittot @ 2020-05-11 19:25 UTC (permalink / raw)
  To: Phil Auld; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

On Thu, 7 May 2020 at 22:36, Phil Auld <pauld@redhat.com> wrote:
>
> sched/fair: Fix enqueue_task_fair warning some more
>
> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> did not fully resolve the issues with the rq->tmp_alone_branch !=
> &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> updated the list.  In this case the second for_each_sched_entity loop can
> further modify se. The later code to fix up the list management fails to do
> what is needed because se no longer points to the sched_entity which broke
> out of the first loop.
>
> Address this by calling leaf_add_rq_list if there are throttled parents while
> doing the second for_each_sched_entity loop.
>

Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)

> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>

With the Fixes tag and the typo mentioned by Tao

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/fair.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..c6d57c334d51 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>                 /* end evaluation on encountering a throttled cfs_rq */
>                 if (cfs_rq_throttled(cfs_rq))
>                         goto enqueue_throttle;
> +
> +               /*
> +                * One parent has been throttled and cfs_rq removed from the
> +                * list. Add it back to not break the leaf list.
> +                */
> +               if (throttled_hierarchy(cfs_rq))
> +                       list_add_leaf_cfs_rq(cfs_rq);
>         }
>
>  enqueue_throttle:
> --
> 2.18.0
>
> V2 rework the fix based on Vincent's suggestion. Thanks Vincent.
>
>
> Cheers,
> Phil
>
> --
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-11 19:25   ` Vincent Guittot
@ 2020-05-11 20:44     ` Phil Auld
  2020-05-12  9:00       ` Dietmar Eggemann
  2020-05-12 14:06       ` Peter Zijlstra
  0 siblings, 2 replies; 29+ messages in thread
From: Phil Auld @ 2020-05-11 20:44 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

On Mon, May 11, 2020 at 09:25:43PM +0200 Vincent Guittot wrote:
> On Thu, 7 May 2020 at 22:36, Phil Auld <pauld@redhat.com> wrote:
> >
> > sched/fair: Fix enqueue_task_fair warning some more
> >
> > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > did not fully resolve the issues with the rq->tmp_alone_branch !=
> > &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > updated the list.  In this case the second for_each_sched_entity loop can
> > further modify se. The later code to fix up the list management fails to do
> > what is needed because se no longer points to the sched_entity which broke
> > out of the first loop.
> >
> > Address this by calling leaf_add_rq_list if there are throttled parents while
> > doing the second for_each_sched_entity loop.
> >
> 
> Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> 
> > Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Juri Lelli <juri.lelli@redhat.com>
> 
> With the Fixes tag and the typo mentioned by Tao
>

Right, that last line of the commit message should read "list_add_leaf_cfs_rq"


> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

Thanks Vincent.

Peter/Ingo, do you want me to resend or can you fix when applying?


Thanks,
Phil

> 
> > ---
> >  kernel/sched/fair.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 02f323b85b6d..c6d57c334d51 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >                 /* end evaluation on encountering a throttled cfs_rq */
> >                 if (cfs_rq_throttled(cfs_rq))
> >                         goto enqueue_throttle;
> > +
> > +               /*
> > +                * One parent has been throttled and cfs_rq removed from the
> > +                * list. Add it back to not break the leaf list.
> > +                */
> > +               if (throttled_hierarchy(cfs_rq))
> > +                       list_add_leaf_cfs_rq(cfs_rq);
> >         }
> >
> >  enqueue_throttle:
> > --
> > 2.18.0
> >
> > V2 rework the fix based on Vincent's suggestion. Thanks Vincent.
> >
> >
> > Cheers,
> > Phil
> >
> > --
> >
> 

-- 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-11 20:44     ` Phil Auld
@ 2020-05-12  9:00       ` Dietmar Eggemann
  2020-05-12 13:37         ` Phil Auld
  2020-05-12 14:06       ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Dietmar Eggemann @ 2020-05-12  9:00 UTC (permalink / raw)
  To: Phil Auld, Vincent Guittot
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

On 11/05/2020 22:44, Phil Auld wrote:
> On Mon, May 11, 2020 at 09:25:43PM +0200 Vincent Guittot wrote:
>> On Thu, 7 May 2020 at 22:36, Phil Auld <pauld@redhat.com> wrote:
>>>
>>> sched/fair: Fix enqueue_task_fair warning some more
>>>
>>> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
>>> did not fully resolve the issues with the rq->tmp_alone_branch !=
>>> &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
>>> the first for_each_sched_entity loop exits due to on_rq, having incompletely
>>> updated the list.  In this case the second for_each_sched_entity loop can
>>> further modify se. The later code to fix up the list management fails to do
>>> what is needed because se no longer points to the sched_entity which broke
>>> out of the first loop.
>>>
>>> Address this by calling leaf_add_rq_list if there are throttled parents while
>>> doing the second for_each_sched_entity loop.
>>>
>>
>> Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
>>
>>> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
>>> Signed-off-by: Phil Auld <pauld@redhat.com>
>>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Cc: Vincent Guittot <vincent.guittot@linaro.org>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Juri Lelli <juri.lelli@redhat.com>
>>
>> With the Fixes tag and the typo mentioned by Tao
>>
> 
> Right, that last line of the commit message should read "list_add_leaf_cfs_rq"
> 
> 
>> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> 
> Thanks Vincent.
> 
> Peter/Ingo, do you want me to resend or can you fix when applying?


Maybe you could add that 'the throttled parent was already added back to
the list by a task enqueue in a parallel child hierarchy'.

IMHO, this is part of the description because otherwise the throttled
parent would have connected the branch.

And the not-adding of the intermediate child cfs_rq would have gone
unnoticed.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-12  9:00       ` Dietmar Eggemann
@ 2020-05-12 13:37         ` Phil Auld
  0 siblings, 0 replies; 29+ messages in thread
From: Phil Auld @ 2020-05-12 13:37 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Vincent Guittot, Peter Zijlstra, linux-kernel, Ingo Molnar, Juri Lelli

Hi Dietmar,

On Tue, May 12, 2020 at 11:00:16AM +0200 Dietmar Eggemann wrote:
> On 11/05/2020 22:44, Phil Auld wrote:
> > On Mon, May 11, 2020 at 09:25:43PM +0200 Vincent Guittot wrote:
> >> On Thu, 7 May 2020 at 22:36, Phil Auld <pauld@redhat.com> wrote:
> >>>
> >>> sched/fair: Fix enqueue_task_fair warning some more
> >>>
> >>> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> >>> did not fully resolve the issues with the rq->tmp_alone_branch !=
> >>> &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> >>> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> >>> updated the list.  In this case the second for_each_sched_entity loop can
> >>> further modify se. The later code to fix up the list management fails to do
> >>> what is needed because se no longer points to the sched_entity which broke
> >>> out of the first loop.
> >>>
> >>> Address this by calling leaf_add_rq_list if there are throttled parents while
> >>> doing the second for_each_sched_entity loop.
> >>>
> >>
> >> Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> >>
> >>> Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
> >>> Signed-off-by: Phil Auld <pauld@redhat.com>
> >>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> >>> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> >>> Cc: Ingo Molnar <mingo@kernel.org>
> >>> Cc: Juri Lelli <juri.lelli@redhat.com>
> >>
> >> With the Fixes tag and the typo mentioned by Tao
> >>
> > 
> > Right, that last line of the commit message should read "list_add_leaf_cfs_rq"
> > 
> > 
> >> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> > 
> > Thanks Vincent.
> > 
> > Peter/Ingo, do you want me to resend or can you fix when applying?
> 
> 
> Maybe you could add that 'the throttled parent was already added back to
> the list by a task enqueue in a parallel child hierarchy'.
> 
> IMHO, this is part of the description because otherwise the throttled
> parent would have connected the branch.
> 
> And the not-adding of the intermediate child cfs_rq would have gone
> unnoticed.

Okay, I'll add that statement. For those curious here are the lines from about
70ms earlier in the trace where the throttled parent (0xffffa085e48ce000) is added
to the list.

bz1738415-test-6264  [005]  1271.315046: sched_waking:         comm=bz1738415-test pid=6269 prio=120 target_cpu=005
bz1738415-test-6264  [005]  1271.315048: sched_migrate_task:   comm=bz1738415-test pid=6269 prio=120 orig_cpu=5 dest_cpu=17
bz1738415-test-6264  [005]  1271.315050: bprint:               enqueue_task_fair: se 0xffffa081e6d7de80 on_rq 0 cfs_rq = 0xffffa085e48ce000
bz1738415-test-6264  [005]  1271.315051: bprint:               enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
bz1738415-test-6264  [005]  1271.315053: bprint:               enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent onlist  Set tmp_alone_branch to 0xffffa085ef92c868
bz1738415-test-6264  [005]  1271.315053: bprint:               enqueue_task_fair: current se = 0xffffa081e6d7de80, orig_se = 0xffffa081e6d7de80
bz1738415-test-6264  [005]  1271.315055: bprint:               enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
bz1738415-test-6264  [005]  1271.315056: sched_wake_idle_without_ipi: cpu=17

> 
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

Thanks,

Phil


> 
> [...]
> 

-- 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-06 14:18 [PATCH] sched/fair: Fix enqueue_task_fair warning some more Phil Auld
  2020-05-06 16:36 ` Vincent Guittot
  2020-05-07 20:36 ` [PATCH v2] " Phil Auld
@ 2020-05-12 13:52 ` Phil Auld
  2020-05-12 14:10   ` Peter Zijlstra
  2020-05-19 18:44   ` [tip: sched/urgent] sched/fair: Fix enqueue_task_fair() " tip-bot2 for Phil Auld
  2 siblings, 2 replies; 29+ messages in thread
From: Phil Auld @ 2020-05-12 13:52 UTC (permalink / raw)
  To: peterz
  Cc: linux-kernel, mingo, vincent.guittot, juri.lelli, zohooouoto,
	dietmar.eggemann

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list.  In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop. The list is not fixed up because the throttled parent was
already added back to the list by a task enqueue in a parallel child hierarchy.

Address this by calling list_add_leaf_cfs_rq if there are throttled parents
while doing the second for_each_sched_entity loop.

v3: clean up commit message and add fixes and review tags.

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
---
 kernel/sched/fair.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..c6d57c334d51 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		/* end evaluation on encountering a throttled cfs_rq */
 		if (cfs_rq_throttled(cfs_rq))
 			goto enqueue_throttle;
+
+               /*
+                * One parent has been throttled and cfs_rq removed from the
+                * list. Add it back to not break the leaf list.
+                */
+               if (throttled_hierarchy(cfs_rq))
+                       list_add_leaf_cfs_rq(cfs_rq);
 	}
 
 enqueue_throttle:
-- 
2.18.0


-- 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-11 20:44     ` Phil Auld
  2020-05-12  9:00       ` Dietmar Eggemann
@ 2020-05-12 14:06       ` Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2020-05-12 14:06 UTC (permalink / raw)
  To: Phil Auld; +Cc: Vincent Guittot, linux-kernel, Ingo Molnar, Juri Lelli

On Mon, May 11, 2020 at 04:44:10PM -0400, Phil Auld wrote:
> Peter/Ingo, do you want me to resend or can you fix when applying?

I now have this, do I need more edits?

---

Subject: sched/fair: Fix enqueue_task_fair warning some more
From: Phil Auld <pauld@redhat.com>
Date: Thu, 7 May 2020 16:36:12 -0400

From: Phil Auld <pauld@redhat.com>

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list.  In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se no longer points to the sched_entity which broke
out of the first loop.

Address this by calling list_add_leaf_cfs_rq if there are throttled
parents while doing the second for_each_sched_entity loop.

Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/20200507203612.GF19331@lorien.usersys.redhat.com
---
 kernel/sched/fair.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct
 		/* end evaluation on encountering a throttled cfs_rq */
 		if (cfs_rq_throttled(cfs_rq))
 			goto enqueue_throttle;
+
+               /*
+                * One parent has been throttled and cfs_rq removed from the
+                * list. Add it back to not break the leaf list.
+                */
+               if (throttled_hierarchy(cfs_rq))
+                       list_add_leaf_cfs_rq(cfs_rq);
 	}
 
 enqueue_throttle:

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-12 13:52 ` [PATCH v3] " Phil Auld
@ 2020-05-12 14:10   ` Peter Zijlstra
  2020-05-12 14:24     ` Phil Auld
  2020-05-19 18:44   ` [tip: sched/urgent] sched/fair: Fix enqueue_task_fair() " tip-bot2 for Phil Auld
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2020-05-12 14:10 UTC (permalink / raw)
  To: Phil Auld
  Cc: linux-kernel, mingo, vincent.guittot, juri.lelli, zohooouoto,
	dietmar.eggemann

On Tue, May 12, 2020 at 09:52:22AM -0400, Phil Auld wrote:
> sched/fair: Fix enqueue_task_fair warning some more
> 
> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> did not fully resolve the issues with the rq->tmp_alone_branch !=
> &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> updated the list.  In this case the second for_each_sched_entity loop can
> further modify se. The later code to fix up the list management fails to do
> what is needed because se does not point to the sched_entity which broke out
> of the first loop. The list is not fixed up because the throttled parent was
> already added back to the list by a task enqueue in a parallel child hierarchy.
> 
> Address this by calling list_add_leaf_cfs_rq if there are throttled parents
> while doing the second for_each_sched_entity loop.
> 
> v3: clean up commit message and add fixes and review tags.

Excellent, ignore what I just sent, I now have this one.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] sched/fair: Fix enqueue_task_fair warning some more
  2020-05-12 14:10   ` Peter Zijlstra
@ 2020-05-12 14:24     ` Phil Auld
  0 siblings, 0 replies; 29+ messages in thread
From: Phil Auld @ 2020-05-12 14:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, vincent.guittot, juri.lelli, zohooouoto,
	dietmar.eggemann

On Tue, May 12, 2020 at 04:10:48PM +0200 Peter Zijlstra wrote:
> On Tue, May 12, 2020 at 09:52:22AM -0400, Phil Auld wrote:
> > sched/fair: Fix enqueue_task_fair warning some more
> > 
> > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > did not fully resolve the issues with the rq->tmp_alone_branch !=
> > &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > updated the list.  In this case the second for_each_sched_entity loop can
> > further modify se. The later code to fix up the list management fails to do
> > what is needed because se does not point to the sched_entity which broke out
> > of the first loop. The list is not fixed up because the throttled parent was
> > already added back to the list by a task enqueue in a parallel child hierarchy.
> > 
> > Address this by calling list_add_leaf_cfs_rq if there are throttled parents
> > while doing the second for_each_sched_entity loop.
> > 
> > v3: clean up commit message and add fixes and review tags.
> 
> Excellent, ignore what I just sent, I now have this one.
> 

Thank you!


Cheers,
Phil
-- 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip: sched/urgent] sched/fair: Fix enqueue_task_fair() warning some more
  2020-05-12 13:52 ` [PATCH v3] " Phil Auld
  2020-05-12 14:10   ` Peter Zijlstra
@ 2020-05-19 18:44   ` tip-bot2 for Phil Auld
  1 sibling, 0 replies; 29+ messages in thread
From: tip-bot2 for Phil Auld @ 2020-05-19 18:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Phil Auld, Peter Zijlstra (Intel),
	Dietmar Eggemann, x86, LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     b34cb07dde7c2346dec73d053ce926aeaa087303
Gitweb:        https://git.kernel.org/tip/b34cb07dde7c2346dec73d053ce926aeaa087303
Author:        Phil Auld <pauld@redhat.com>
AuthorDate:    Tue, 12 May 2020 09:52:22 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 19 May 2020 20:34:10 +02:00

sched/fair: Fix enqueue_task_fair() warning some more

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list.  In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop. The list is not fixed up because the throttled parent was
already added back to the list by a task enqueue in a parallel child hierarchy.

Address this by calling list_add_leaf_cfs_rq if there are throttled parents
while doing the second for_each_sched_entity loop.

Fixes: fe61468b2cb ("sched/fair: Fix enqueue_task_fair warning")
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20200512135222.GC2201@lorien.usersys.redhat.com
---
 kernel/sched/fair.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b..c6d57c3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		/* end evaluation on encountering a throttled cfs_rq */
 		if (cfs_rq_throttled(cfs_rq))
 			goto enqueue_throttle;
+
+               /*
+                * One parent has been throttled and cfs_rq removed from the
+                * list. Add it back to not break the leaf list.
+                */
+               if (throttled_hierarchy(cfs_rq))
+                       list_add_leaf_cfs_rq(cfs_rq);
 	}
 
 enqueue_throttle:

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-05-19 18:44 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-06 14:18 [PATCH] sched/fair: Fix enqueue_task_fair warning some more Phil Auld
2020-05-06 16:36 ` Vincent Guittot
2020-05-06 18:05   ` Phil Auld
2020-05-07 15:06     ` Vincent Guittot
2020-05-07 15:17       ` Phil Auld
2020-05-07 18:04       ` Phil Auld
2020-05-07 18:21         ` Vincent Guittot
2020-05-07 20:36 ` [PATCH v2] " Phil Auld
2020-05-08 15:15   ` Tao Zhou
2020-05-08 15:27     ` Vincent Guittot
2020-05-08 17:02       ` Tao Zhou
2020-05-11  8:36         ` Vincent Guittot
     [not found]           ` <BL0PR14MB37792D0FD629FFF1C9FEDE369AA10@BL0PR14MB3779.namprd14.prod.outlook.com>
2020-05-11 19:22             ` Vincent Guittot
2020-05-11  8:40         ` Dietmar Eggemann
2020-05-11  9:36           ` Vincent Guittot
2020-05-11 10:39             ` Dietmar Eggemann
2020-05-11 12:12               ` Vincent Guittot
2020-05-11 17:02                 ` Dietmar Eggemann
2020-05-11 17:14                   ` Vincent Guittot
     [not found]               ` <BL0PR14MB3779ED5E2E5AD157B58D002C9AA10@BL0PR14MB3779.namprd14.prod.outlook.com>
2020-05-11 17:03                 ` Dietmar Eggemann
2020-05-11 19:25   ` Vincent Guittot
2020-05-11 20:44     ` Phil Auld
2020-05-12  9:00       ` Dietmar Eggemann
2020-05-12 13:37         ` Phil Auld
2020-05-12 14:06       ` Peter Zijlstra
2020-05-12 13:52 ` [PATCH v3] " Phil Auld
2020-05-12 14:10   ` Peter Zijlstra
2020-05-12 14:24     ` Phil Auld
2020-05-19 18:44   ` [tip: sched/urgent] sched/fair: Fix enqueue_task_fair() " tip-bot2 for Phil Auld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).