From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755194AbbJ1ClB (ORCPT ); Tue, 27 Oct 2015 22:41:01 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:39346 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752053AbbJ1ClA (ORCPT ); Tue, 27 Oct 2015 22:41:00 -0400 Date: Tue, 27 Oct 2015 19:40:54 -0700 From: Joonwoo Park To: Peter Zijlstra Cc: Ingo Molnar , linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, ohaugan@codeaurora.org Subject: Re: [PATCH] sched: fix incorrect wait time and wait count statistics Message-ID: <20151028024054.GA8839@codeaurora.org> References: <1445750594-14078-1-git-send-email-joonwoop@codeaurora.org> <20151025102610.GQ2508@worktop.programming.kicks-ass.net> <562ED710.7040009@codeaurora.org> <20151027125728.GA3201@worktop.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151027125728.GA3201@worktop.amr.corp.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 27, 2015 at 01:57:28PM +0100, Peter Zijlstra wrote: > > (Excessive quoting for Olav) > > On Mon, Oct 26, 2015 at 06:44:48PM -0700, Joonwoo Park wrote: > > On 10/25/2015 03:26 AM, Peter Zijlstra wrote: > > > > Also note that on both sites we also set TASK_ON_RQ_MIGRATING -- albeit > > > late. Can't you simply set that earlier (and back to QUEUED later) and > > > test for task_on_rq_migrating() instead of blowing up the fastpath like > > > you did? > > > > > > > Yes it's doable. I also find it's much simpler. > > > From 98d615d46211a90482a0f9b7204265c54bba8520 Mon Sep 17 00:00:00 2001 > > From: Joonwoo Park > > Date: Mon, 26 Oct 2015 16:37:47 -0700 > > Subject: [PATCH v2] sched: fix incorrect wait time and wait count statistics > > > > At present scheduler resets task's wait start timestamp when the task > > migrates to another rq. This misleads scheduler itself into reporting > > less wait time than actual by omitting time spent for waiting prior to > > migration and also more wait count than actual by counting migration as > > wait end event which can be seen by trace or /proc//sched with > > CONFIG_SCHEDSTATS=y. > > > > Carry forward migrating task's wait time prior to migration and > > don't count migration as a wait end event to fix such statistics error. > > > > In order to determine whether task is migrating mark task->on_rq with > > TASK_ON_RQ_MIGRATING while dequeuing and enqueuing due to migration. > > > > To: Ingo Molnar > > To: Peter Zijlstra > > Cc: linux-kernel@vger.kernel.org > > Signed-off-by: Joonwoo Park > > --- > > So now that you rely on TASK_ON_RQ_MIGRATING; I think you missed one > place that can migrate sched_fair tasks and doesn't set it. > > Olav recently did a patch adding TASK_ON_RQ_MIGRATING to _every_ > migration path, but that is (still) somewhat overkill. With your changes > we need it for sched_fair though. > > So I think you need to change __migrate_swap_task(), which is used by > the NUMA scheduling to swap two running tasks. Oh yes... __migrate_swap_task() can migrate fair class task so I should mark as TASK_ON_RQ_MIGRATING. I will fix this in subsequent patch. > > Also, it might be prudent to extend the CONFIG_SCHED_DEBUG ifdef in > set_task_cpu() to test for this new requirement: > > WARN_ON_ONCE(p->state == TASK_RUNNING && > p->class == &fair_sched_class && > p->on_rq != TASK_ON_RQ_MIGRATING); > I conceived to argue that if we had Olav's change (with revised marking order) dummy like myself don't need to worry about stale on_rq state and probably didn't make mistake like I did with __migrate_swap_task(). But I don't think I can argue for that reason anymore as my patch will set TASK_ON_RQ_MIGRATING for all migration paths for the fair class tasks at least and moreover above macro you suggested would fortify the new requirement. I will add that macro. I recall Olav had some other reason for his patch though. > > kernel/sched/core.c | 4 ++-- > > kernel/sched/fair.c | 17 ++++++++++++++--- > > 2 files changed, 16 insertions(+), 5 deletions(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index bcd214e..d9e4ad5 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -1069,8 +1069,8 @@ static struct rq *move_queued_task(struct rq *rq, struct task_struct *p, int new > > { > > lockdep_assert_held(&rq->lock); > > > > - dequeue_task(rq, p, 0); > > p->on_rq = TASK_ON_RQ_MIGRATING; > > + dequeue_task(rq, p, 0); > > set_task_cpu(p, new_cpu); > > raw_spin_unlock(&rq->lock); > > > > @@ -1078,8 +1078,8 @@ static struct rq *move_queued_task(struct rq *rq, struct task_struct *p, int new > > > > raw_spin_lock(&rq->lock); > > BUG_ON(task_cpu(p) != new_cpu); > > - p->on_rq = TASK_ON_RQ_QUEUED; > > enqueue_task(rq, p, 0); > > + p->on_rq = TASK_ON_RQ_QUEUED; > > check_preempt_curr(rq, p, 0); > > > > return rq; > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 9a5e60f..7609576 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -740,7 +740,11 @@ static void update_curr_fair(struct rq *rq) > > static inline void > > update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se) > > { > > - schedstat_set(se->statistics.wait_start, rq_clock(rq_of(cfs_rq))); > > + schedstat_set(se->statistics.wait_start, > > + task_on_rq_migrating(task_of(se)) && > > + likely(rq_clock(rq_of(cfs_rq)) > se->statistics.wait_start) ? > > + rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start : > > + rq_clock(rq_of(cfs_rq))); > > So I get that you want to avoid emitting code for !SCHEDSTATS; but that > is rather unreadable. Either use the GCC stmt-expr or wrap the lot in > #ifdef. > Will do. > > } > > > > /* > > @@ -759,6 +763,13 @@ static void update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se) > > static void > > update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) > > { > > + if (task_on_rq_migrating(task_of(se))) { > > Maybe add a comment that we adjust the wait_start time-base and will > rebase on the new rq_clock in update_stats_wait_start(). > Will do. Thanks, Joonwoo > > + schedstat_set(se->statistics.wait_start, > > + rq_clock(rq_of(cfs_rq)) - > > + se->statistics.wait_start); > > + return; > > + } > > + > > schedstat_set(se->statistics.wait_max, max(se->statistics.wait_max, > > rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start)); > > schedstat_set(se->statistics.wait_count, se->statistics.wait_count + 1); > > @@ -5656,8 +5667,8 @@ static void detach_task(struct task_struct *p, struct lb_env *env) > > { > > lockdep_assert_held(&env->src_rq->lock); > > > > - deactivate_task(env->src_rq, p, 0); > > p->on_rq = TASK_ON_RQ_MIGRATING; > > + deactivate_task(env->src_rq, p, 0); > > set_task_cpu(p, env->dst_cpu); > > } > > > > @@ -5790,8 +5801,8 @@ static void attach_task(struct rq *rq, struct task_struct *p) > > lockdep_assert_held(&rq->lock); > > > > BUG_ON(task_rq(p) != rq); > > - p->on_rq = TASK_ON_RQ_QUEUED; > > activate_task(rq, p, 0); > > + p->on_rq = TASK_ON_RQ_QUEUED; > > check_preempt_curr(rq, p, 0); > > } -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation