All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] sched/fair: Fix the wrong sched_stat_wait time
@ 2020-09-30  2:47 qianjun.kernel
  2020-09-30  8:19 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: qianjun.kernel @ 2020-09-30  2:47 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, juri.lelli, linux-kernel
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, jun qian, Yafang Shao

From: jun qian <qianjun.kernel@gmail.com>

When the sched_schedstat changes from 0 to 1, some sched se maybe
already in the runqueue, the se->statistics.wait_start will be 0.
So it will let the (rq_of(cfs_rq)) - se->statistics.wait_start)
wrong. We need to avoid this scenario.

Signed-off-by: jun qian <qianjun.kernel@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 658aa7a..dd7c3bb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -908,6 +908,14 @@ static void update_curr_fair(struct rq *rq)
 	if (!schedstat_enabled())
 		return;
 
+	/*
+	 * When the sched_schedstat changes from 0 to 1, some sched se maybe
+	 * already in the runqueue, the se->statistics.wait_start will be 0.
+	 * So it will let the delta wrong. We need to avoid this scenario.
+	 */
+	if (unlikely(!schedstat_val(se->statistics.wait_start)))
+		return;
+
 	delta = rq_clock(rq_of(cfs_rq)) - schedstat_val(se->statistics.wait_start);
 
 	if (entity_is_task(se)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix the wrong sched_stat_wait time
  2020-09-30  2:47 [PATCH 1/1] sched/fair: Fix the wrong sched_stat_wait time qianjun.kernel
@ 2020-09-30  8:19 ` Peter Zijlstra
  2020-09-30  9:16   ` jun qian
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2020-09-30  8:19 UTC (permalink / raw)
  To: qianjun.kernel
  Cc: mingo, vincent.guittot, juri.lelli, linux-kernel,
	dietmar.eggemann, rostedt, bsegall, mgorman, Yafang Shao

On Wed, Sep 30, 2020 at 10:47:12AM +0800, qianjun.kernel@gmail.com wrote:
> From: jun qian <qianjun.kernel@gmail.com>
> 
> When the sched_schedstat changes from 0 to 1, some sched se maybe
> already in the runqueue, the se->statistics.wait_start will be 0.
> So it will let the (rq_of(cfs_rq)) - se->statistics.wait_start)
> wrong. We need to avoid this scenario.

Is this really the only problem there? Did you do a full audit of that
schedstat nonsense?

> Signed-off-by: jun qian <qianjun.kernel@gmail.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  kernel/sched/fair.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 658aa7a..dd7c3bb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -908,6 +908,14 @@ static void update_curr_fair(struct rq *rq)

your git-diff is 'funny', it got the function ^ wrong.

>  	if (!schedstat_enabled())
>  		return;
>  
> +	/*
> +	 * When the sched_schedstat changes from 0 to 1, some sched se maybe
> +	 * already in the runqueue, the se->statistics.wait_start will be 0.
> +	 * So it will let the delta wrong. We need to avoid this scenario.
> +	 */
> +	if (unlikely(!schedstat_val(se->statistics.wait_start)))
> +		return;
> +
>  	delta = rq_clock(rq_of(cfs_rq)) - schedstat_val(se->statistics.wait_start);
>  
>  	if (entity_is_task(se)) {
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix the wrong sched_stat_wait time
  2020-09-30  8:19 ` Peter Zijlstra
@ 2020-09-30  9:16   ` jun qian
  2020-09-30  9:57     ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: jun qian @ 2020-09-30  9:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, Vincent Guittot, juri.lelli, linux-kernel,
	dietmar.eggemann, rostedt, bsegall, mgorman, Yafang Shao

Peter Zijlstra <peterz@infradead.org> 于2020年9月30日周三 下午4:20写道:
>
> On Wed, Sep 30, 2020 at 10:47:12AM +0800, qianjun.kernel@gmail.com wrote:
> > From: jun qian <qianjun.kernel@gmail.com>
> >
> > When the sched_schedstat changes from 0 to 1, some sched se maybe
> > already in the runqueue, the se->statistics.wait_start will be 0.
> > So it will let the (rq_of(cfs_rq)) - se->statistics.wait_start)
> > wrong. We need to avoid this scenario.
>
> Is this really the only problem there? Did you do a full audit of that
> schedstat nonsense?
>

Did you mean that the sched_stat_xxx's xxx_start(sched_stat_sleep
sched_stat_iowait sched_stat_blocked
sched_stat_runtime) may be also depend the schedstat_enabled?
I have searched the codes, and found that these sched_stat_xxx's
xxx_start don't depend the schedstat_enabled
except the wait_start.

This patch is going to slove the problem that when the
schedstat_enabled is enabled, the sched_stat_wait of
the probed process will become unbelievable big probability in the fist time.

> > Signed-off-by: jun qian <qianjun.kernel@gmail.com>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  kernel/sched/fair.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 658aa7a..dd7c3bb 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -908,6 +908,14 @@ static void update_curr_fair(struct rq *rq)
>
> your git-diff is 'funny', it got the function ^ wrong.
>

sorry  :)

> >       if (!schedstat_enabled())
> >               return;
> >
> > +     /*
> > +      * When the sched_schedstat changes from 0 to 1, some sched se maybe
> > +      * already in the runqueue, the se->statistics.wait_start will be 0.
> > +      * So it will let the delta wrong. We need to avoid this scenario.
> > +      */
> > +     if (unlikely(!schedstat_val(se->statistics.wait_start)))
> > +             return;
> > +
> >       delta = rq_clock(rq_of(cfs_rq)) - schedstat_val(se->statistics.wait_start);
> >
> >       if (entity_is_task(se)) {
> > --
> > 1.8.3.1
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix the wrong sched_stat_wait time
  2020-09-30  9:16   ` jun qian
@ 2020-09-30  9:57     ` Peter Zijlstra
  2020-09-30 13:08       ` jun qian
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2020-09-30  9:57 UTC (permalink / raw)
  To: jun qian
  Cc: mingo, Vincent Guittot, juri.lelli, linux-kernel,
	dietmar.eggemann, rostedt, bsegall, mgorman, Yafang Shao

On Wed, Sep 30, 2020 at 05:16:29PM +0800, jun qian wrote:
> Peter Zijlstra <peterz@infradead.org> 于2020年9月30日周三 下午4:20写道:
> >
> > On Wed, Sep 30, 2020 at 10:47:12AM +0800, qianjun.kernel@gmail.com wrote:
> > > From: jun qian <qianjun.kernel@gmail.com>
> > >
> > > When the sched_schedstat changes from 0 to 1, some sched se maybe
> > > already in the runqueue, the se->statistics.wait_start will be 0.
> > > So it will let the (rq_of(cfs_rq)) - se->statistics.wait_start)
> > > wrong. We need to avoid this scenario.
> >
> > Is this really the only problem there? Did you do a full audit of that
> > schedstat nonsense?
> >
> 
> Did you mean that the sched_stat_xxx's xxx_start(sched_stat_sleep
> sched_stat_iowait sched_stat_blocked
> sched_stat_runtime) may be also depend the schedstat_enabled?

Yeah, this runtime schedstat_enabled thing is fairly recent, it used to
be an always on/off kinda thing.

At the time we figured inconsistencies from dynamically
enabling/disabling it were okay, it's just stats after all.

But if you now want to 'fix' that, then a full audit might be nice.

> I have searched the codes, and found that these sched_stat_xxx's
> xxx_start don't depend the schedstat_enabled
> except the wait_start.

OK, so you did the audit and only found this one issue? That's good
Changelog material :-)

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix the wrong sched_stat_wait time
  2020-09-30  9:57     ` Peter Zijlstra
@ 2020-09-30 13:08       ` jun qian
  2020-10-09  9:33         ` jun qian
  0 siblings, 1 reply; 6+ messages in thread
From: jun qian @ 2020-09-30 13:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, Vincent Guittot, juri.lelli, linux-kernel,
	dietmar.eggemann, rostedt, Benjamin Segall, mgorman, Yafang Shao

Peter Zijlstra <peterz@infradead.org> 于2020年9月30日周三 下午5:57写道:
>
> On Wed, Sep 30, 2020 at 05:16:29PM +0800, jun qian wrote:
> > Peter Zijlstra <peterz@infradead.org> 于2020年9月30日周三 下午4:20写道:
> > >
> > > On Wed, Sep 30, 2020 at 10:47:12AM +0800, qianjun.kernel@gmail.com wrote:
> > > > From: jun qian <qianjun.kernel@gmail.com>
> > > >
> > > > When the sched_schedstat changes from 0 to 1, some sched se maybe
> > > > already in the runqueue, the se->statistics.wait_start will be 0.
> > > > So it will let the (rq_of(cfs_rq)) - se->statistics.wait_start)
> > > > wrong. We need to avoid this scenario.
> > >
> > > Is this really the only problem there? Did you do a full audit of that
> > > schedstat nonsense?
> > >
> >
> > Did you mean that the sched_stat_xxx's xxx_start(sched_stat_sleep
> > sched_stat_iowait sched_stat_blocked
> > sched_stat_runtime) may be also depend the schedstat_enabled?
>
> Yeah, this runtime schedstat_enabled thing is fairly recent, it used to
> be an always on/off kinda thing.
>
> At the time we figured inconsistencies from dynamically
> enabling/disabling it were okay, it's just stats after all.
>
> But if you now want to 'fix' that, then a full audit might be nice.
>
> > I have searched the codes, and found that these sched_stat_xxx's
> > xxx_start don't depend the schedstat_enabled
> > except the wait_start.
>
> OK, so you did the audit and only found this one issue? That's good
> Changelog material :-)
>

I found another problem, when the sched_schedstat changes from 1 to 0,
the sched se
which is already in the runqueue, the statistics.wait_start already
has a value. At this moment
sched_schedstat changes from 0 to 1 again, the (rq_of(cfs_rq)) -
se->statistics.wait_start)
will not be 0 and the wait time will be bigger than the real one. So
we need to modify the  patch
to resolve the problem with this scenario.

So I really need a full audit might, :-)

Thanks

> Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] sched/fair: Fix the wrong sched_stat_wait time
  2020-09-30 13:08       ` jun qian
@ 2020-10-09  9:33         ` jun qian
  0 siblings, 0 replies; 6+ messages in thread
From: jun qian @ 2020-10-09  9:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, Vincent Guittot, juri.lelli, linux-kernel,
	dietmar.eggemann, rostedt, Benjamin Segall, mgorman, Yafang Shao

jun qian <qianjun.kernel@gmail.com> 于2020年9月30日周三 下午9:08写道:
>
> Peter Zijlstra <peterz@infradead.org> 于2020年9月30日周三 下午5:57写道:
> >
> > On Wed, Sep 30, 2020 at 05:16:29PM +0800, jun qian wrote:
> > > Peter Zijlstra <peterz@infradead.org> 于2020年9月30日周三 下午4:20写道:
> > > >
> > > > On Wed, Sep 30, 2020 at 10:47:12AM +0800, qianjun.kernel@gmail.com wrote:
> > > > > From: jun qian <qianjun.kernel@gmail.com>
> > > > >
> > > > > When the sched_schedstat changes from 0 to 1, some sched se maybe
> > > > > already in the runqueue, the se->statistics.wait_start will be 0.
> > > > > So it will let the (rq_of(cfs_rq)) - se->statistics.wait_start)
> > > > > wrong. We need to avoid this scenario.
> > > >
> > > > Is this really the only problem there? Did you do a full audit of that
> > > > schedstat nonsense?
> > > >
> > >
> > > Did you mean that the sched_stat_xxx's xxx_start(sched_stat_sleep
> > > sched_stat_iowait sched_stat_blocked
> > > sched_stat_runtime) may be also depend the schedstat_enabled?
> >
> > Yeah, this runtime schedstat_enabled thing is fairly recent, it used to
> > be an always on/off kinda thing.
> >
> > At the time we figured inconsistencies from dynamically
> > enabling/disabling it were okay, it's just stats after all.
> >
> > But if you now want to 'fix' that, then a full audit might be nice.
> >
> > > I have searched the codes, and found that these sched_stat_xxx's
> > > xxx_start don't depend the schedstat_enabled
> > > except the wait_start.
> >
> > OK, so you did the audit and only found this one issue? That's good
> > Changelog material :-)
> >
>
> I found another problem, when the sched_schedstat changes from 1 to 0,
> the sched se
> which is already in the runqueue, the statistics.wait_start already
> has a value. At this moment
> sched_schedstat changes from 0 to 1 again, the (rq_of(cfs_rq)) -
> se->statistics.wait_start)
> will not be 0 and the wait time will be bigger than the real one. So
> we need to modify the  patch
> to resolve the problem with this scenario.
>

hi Peter

I have sent another patch to  improve the accuracy of sched_stat_wait
statistics.
I have no good idea to solve the another problem what i described up,
but the new
patch can  improve the accuracy.

Thanks

> So I really need a full audit might, :-)
>
> Thanks
>
> > Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-10-09  9:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-30  2:47 [PATCH 1/1] sched/fair: Fix the wrong sched_stat_wait time qianjun.kernel
2020-09-30  8:19 ` Peter Zijlstra
2020-09-30  9:16   ` jun qian
2020-09-30  9:57     ` Peter Zijlstra
2020-09-30 13:08       ` jun qian
2020-10-09  9:33         ` jun qian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.